Generic - Stellar by HTML5 UP

Introduction

Searching for songs using incomplete or paraphrased lyrics is challenging for traditional keyword-based search engines. This project evaluates different ranking models for lyrics-based retrieval, focusing on their effectiveness rather than developing a full search engine.

Using a dataset of 50,000 songs, we compared statistical ranking models like BM25, TF-IDF etc with deep learning-based models. Performance was measured using Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG) Scores.

Dataset & Preprocessing

The Genius Lyrics Dataset provided 50,000 songs with metadata such as artists, albums, and release dates. We manually labeled 1,000 query-document pairs with relevance scores (1-5) to evaluate ranking models.

Preprocessing steps included:

Tokenization: Processed lyrics with a Regex-based tokenizer
Stopword Removal: Filtered non-informative words
Metadata Structuring: Extracted artist, album, and genre details

Ranking Models Evaluated

Traditional Models:

BM25 – A term frequency-based ranking model that balances term importance and document length.
TF-IDF – Weighs word importance in a document relative to the whole dataset..
Pivoted Normalization – Adjusts term weighting by normalizing document length.
WordCountCosineSimilarity – Measures similarity between query and lyrics using vector-based term overlap.

Deep Learning Models:

Siamese BERT – A transformer-based model that captures semantic relationships, improving retrieval for paraphrased lyrics.
Latent Semantic Analysis (LSA) – Reduces text to a lower-dimensional space to capture conceptual similarities.

Evaluation & Results

Ranking models were compared using MAP (measuring ranking quality) and NDCG (giving higher weight to top-ranked relevant results).

Ranker	MAP Score	NDCG Score
BM25	0.2566	0.0269
TF-IDF	0.4268	0.0488
WordCountCosineSimilarity	0.3883	0.0357
Siamese BERT	0.4972	0.0704
LSA	0.3457	0.0488

Key Findings:

Siamese BERT achieved the highest performance, proving deep learning is more effective for lyrics retrieval.
TF-IDF outperformed BM25, highlighting better handling of stopwords and weighting.
BM25 struggled with paraphrased lyrics, limiting retrieval effectiveness.

Report