Intelligent Spotify playlist generation through temporal listening behavior clustering
SmartList transforms your Spotify “Extended Streaming History” into context‑aware Daylists by clustering when you listen, not just what. Instead of generic algorithms, it builds playlists that mirror your real weekly rhythms.
- Loads raw JSON files exported from Spotify.
- Aggregates play sessions into
full_history(time features + metadata) andsong_history(track‑level stats).
- Day‑of‑week & time‑of‑day converted into cyclical features:
dow_sin = sin(2π × day_of_week / 7) dow_cos = cos(2π × day_of_week / 7) time_sin = sin(2π × minutes_since_midnight / 1440) time_cos = cos(2π × minutes_since_midnight / 1440)
- K‑Means on 4‑dimensional cyclical time space (n_clusters=50).
- Silhouette Score ≈ 0.53, indicating well‑separated listening patterns.
- Summarization: maps clusters to a dominant day, time range, and descriptive name.
-
Popularity Score:
popularity_score = 0.6 * play_count + 0.3 * minutes_listened - 0.1 * skip_count
Balances total plays, listening depth, and penalizes skips.
-
Three‑Stage Sampling:
- Stage 1: Curated Favorites
- Action: Randomly select 10 tracks from
top_songs[cluster_id](your top played songs for that cluster). - Reason: Ensures each playlist anchors around the cluster’s signature tracks—your core favorites.
- Action: Randomly select 10 tracks from
- Stage 2: Cluster Hits
- Action: Randomly sample 10 tracks from the top 100 by
popularity_scorein the full cluster. - Reason: Introduces global favorites and deep cuts that rank highly but didn’t make the cluster-based top list.
- Action: Randomly sample 10 tracks from the top 100 by
- Stage 3: Contextual Fill
- Action: Randomly sample 10 additional tracks from the cluster where
popularity_score ≥ min_popularity. - Reason: Fills out the playlist with contextually appropriate tracks, maintaining a quality threshold.
- Action: Randomly sample 10 additional tracks from the cluster where
- Stage 1: Curated Favorites
-
Uniqueness & Reproducibility
- Enforce no duplicate URIs or artists across all 30 tracks.
- Shuffles freshly on each run, so you get a new random playlist every time
- Flask app serving:
- Dashboard: cluster cards organized by day, shows silhouette tag.
- Cluster pages: playlist preview, top artists/songs, CSV download.
- Clean, Spotify‑inspired design with responsive layout.
- Step‑by‑step EDA:
- Data loading → clustering → summary tables → playlist samples.
- Install dependencies
pip install flask pandas scikit-learn numpy
- Prepare data
- Download “Extended Streaming History” JSON from Spotify.
- Place files in
data/Raw Data/.
- Run notebook
jupyter notebook pipeline.ipynb
- Launch web app
Visit
python main.py
http://localhost:5001.
- Audio‑feature integration (tempo, valence, energy) for mood enrichment.
- Real‑time Spotify API sync for live updates.
- Advanced clustering (e.g., DBSCAN) to capture irregular listening patterns.
Built with Python, scikit‑learn, and Flask.
