Feature/pytorch embedding ranker v4#2228
Feature/pytorch embedding ranker v4#2228demoncoder-crypto wants to merge 3 commits intorecommenders-team:stagingfrom
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
| @@ -0,0 +1,10 @@ | |||
| { | |||
There was a problem hiding this comment.
not sure if there was an error in the submission of the notebook, but I can't see it
| # Licensed under the MIT License. | ||
|
|
||
| import numpy as np | ||
| import warnings # Added for R-Precision warning |
There was a problem hiding this comment.
this file is still from the other PR, it needs to be removed
| # Copyright (c) Recommenders contributors. | ||
| # Licensed under the MIT License. | ||
|
|
||
| import os | ||
| import numpy as np | ||
| import pandas as pd | ||
| import torch | ||
|
|
||
| from recommenders.utils.constants import ( | ||
| DEFAULT_USER_COL, | ||
| DEFAULT_ITEM_COL, | ||
| DEFAULT_RATING_COL, | ||
| DEFAULT_PREDICTION_COL, | ||
| DEFAULT_K, | ||
| ) | ||
|
|
||
| def predict_rating( | ||
| model, | ||
| test_df, | ||
| col_user=DEFAULT_USER_COL, | ||
| col_item=DEFAULT_ITEM_COL, | ||
| col_rating=DEFAULT_RATING_COL, | ||
| col_prediction=DEFAULT_PREDICTION_COL, | ||
| batch_size=1024, | ||
| ): | ||
| """Predict ratings for user-item pairs in test data. | ||
|
|
||
| Args: | ||
| model (NNEmbeddingRanker): Trained embedding ranker model. | ||
| test_df (pandas.DataFrame): Test dataframe containing user-item pairs. | ||
| col_user (str): User column name. | ||
| col_item (str): Item column name. | ||
| col_rating (str): Rating column name. |
There was a problem hiding this comment.
I think all this can become the notebook.
Some thoughts for the notebook:
- Here is a good example of a useful notebook: https://github.com/recommenders-team/recommenders/blob/main/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb it explains both the math behind it and an implementation
- Think of the notebook as a way to showcase how to use the ranker and what is the ranker about.
- The objective of the notebook is that it needs to be useful. That's the most important metric.
- Ideally, a person could go to this notebook, add their data, run it, and understand how to use it.
- For the notebook, you can just showcase the content of these functions directly inside the notebook.
- Something very important is that we follow the principle of explicit is better than implicit. For example, we don't something like
for metric_name, metric_func in metrics.items(), because it adds a layer of complexity. Instead we show the metrics explicitely:rmse(true, pred),precision_at_k(true, pred, params), etc. Each person with a quick view can see what is going on. - Feel free to come to our Monday meeting if you want to understand better how we do the notebooks @demoncoder-crypto
| self.col_rating = col_rating | ||
| self.col_prediction = col_prediction | ||
| self.threshold = threshold | ||
| self.rating_pred_raw = rating_pred # Store raw predictions before processing |
There was a problem hiding this comment.
rating_pred is already stored in self.rating_pred, and self.rating_pred_raw is not used in this class. Any reason you want to store?
| introducing serendipity into music recommendation, WSDM 2012 | ||
|
|
||
| Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, | ||
| Eugene Yan, Serendipity's unpopular best friend in Recommender Systems, |
There was a problem hiding this comment.
The original reference sentence is correct. https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/
| all_pairs = [] | ||
| for user in valid_users: | ||
| for item in all_items: | ||
| all_pairs.append((user, item)) |
There was a problem hiding this comment.
all_pairs = [(u, i) for u in valid_users for i in all_items]
or
from itertools import product
all_pairs = list(product(valid_users, all_items))
| # Filter out seen pairs | ||
| result_df = result_df[~result_df.apply(lambda row: (row[col_user], row[col_item]) in seen_pairs, axis=1)] | ||
|
|
||
| # Get top-k recommendations for each user |
There was a problem hiding this comment.
can you reuse predict_rating if generating_recommendation is using the same logic under the hood and sort & cut top_k in the end?
| # Calculate metrics | ||
| results = {} | ||
| for metric_name, metric_func in metrics.items(): | ||
| # Different metrics may have different required parameters |
There was a problem hiding this comment.
if the only difference is k, you may:
results[metric_name] = metric_func(
test_df,
predictions_df,
col_user=col_user,
col_item=col_item,
col_rating=col_rating,
col_prediction=col_prediction,
k=k if 'k' in metric_func.__code__.co_varnames else None,
)
|
@setuc FYI |
|
@demoncoder-crypto how is this work going? |
|
@jmarrietar do you think you would be able to take over this work? It is very similar to embdotbias |
|
Hi, @miguelgfierro . I'll be on a tight schedule for the following months. But I can take a look when I free up a little bit 😄 . |
Description
Related Issues
References
Checklist:
git commit -s -m "your commit message".staging branchAND NOT TOmain branch.