Staging to main: wide and deep to PyTorch and other improvements#2286
Open
miguelgfierro wants to merge 111 commits intomainfrom
Open
Staging to main: wide and deep to PyTorch and other improvements#2286miguelgfierro wants to merge 111 commits intomainfrom
miguelgfierro wants to merge 111 commits intomainfrom
Conversation
* Rewrite testing workflows using only GitHub-hosted runners instead of AzureML Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Rewrite test_groups.py Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Replace test_groups.py with test_groups.yml Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Rename all workflows Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use GitHub GPU runners Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Enable unit-tests.yml Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct shell command and action names Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct commands and python versions Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct Dockerfile path Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct yq install command Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add entrypoint.sh Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Copy repo to be along with dockerfile Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct yq command Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct paths and yq version Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Drop Python 3.18 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct openjdk version Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Set openjdk<23 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * replace recodatasets with guthub resource repo Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * replace deeprec info Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * kdd Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * kdd Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * MIND Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * 🐛 Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * Update docs Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * update criteo URL (#2260) Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Co-authored-by: miguelgfierro <miguelgfierro@users.noreply.github.com> * Merge small test groups and update testing time tally Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correc test group name Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try to run on the runner group instead of single runner Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Revert Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove marks for pytest fixture Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try only Python 3.9 for simplification Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update testing time Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Enable tests for Python 3.8, 3.10 and 3.11 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update docker base image Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try self-hosted GPU instead of GitHub-hosted Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test nvidia-smi Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try self-hosted GPU Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test nvidia-smi inside container Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test nvidia-smi inside container Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try container directly instead of docker action Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct path Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct path Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct path Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct conda activation Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add lightgcn model dir Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Disable parallel testing on GPU Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Disable parallel excution on GPU * Remove pytest-xdist on GPU testing Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct variable substitution Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct if statement Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct typo Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Use CUDA 12.2.2 and cuDNN 8.9 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try self-hoste gpu Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try cuda 13.1.0 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Downgrade tensorflow version Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test on GPU directly Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Install yq and label with timestamp Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try uv Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try call docker directly Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Install unzip and zip Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Install curl Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct base image and docker commands Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct commands Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Cache uv downloaded python packages Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove docker image when finished Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Keep iamge and container for debugging Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Rewrite test workflows and Dockerfile Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Rewrite Dockerfile and correct tests.yml Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct env path Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try all tests Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Install zip and unzip Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct test_groups.yml Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct SDKMAN! setup Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try parallel testing Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Disable parallel testing for group_gpu_001 Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Disable parallel testing on GPU Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct commands Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Test pr_gate and nightly Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Remove pytest-xdist Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try GitHub-hosted runners only Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try cpu-nightly on larger runners * Change GPU image * Check CPU and memory before tests * Update testing time Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Correct labels Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try runner groups Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try using group and labels together Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try again Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Add annotations for hardware checks Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Try cpu-nightly with Python 3.11 only Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> * Update testing time Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> --------- Signed-off-by: Simon Zhao <simonyansenzhao@gmail.com> Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Co-authored-by: miguelgfierro <miguelgfierro@users.noreply.github.com> Co-authored-by: Miguel Fierro <3491412+miguelgfierro@users.noreply.github.com>
Fixed issue with Huggingface's etag of MIND dataset
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Signed-off-by: Rohit Goyal <sprkgoyal@gmail.com>
Speed up `lightfm_utils.py:prepare_test_df`
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
The _get_item_feature_similarity function computes cosine similarity between item feature vectors using the formula dot(f1, f2) / (norm(f1) * norm(f2)). When either feature vector is a zero vector (norm = 0), this causes a ZeroDivisionError at runtime. This handles the zero-norm edge case by returning 0.0 similarity when either vector has zero magnitude, which is the mathematically correct convention for cosine similarity with zero vectors.
…ivision Fix ZeroDivisionError in item feature cosine similarity
Replace `logs={}` with `logs=None` and add `if logs is None: logs = {}`
guard in all Keras callback methods across multinomial_vae.py and
standard_vae.py.
Using a mutable default argument like `{}` is a well-known Python
anti-pattern (W0102) — the same dict object is shared across all calls,
which can lead to unexpected state leakage between invocations.
Remove the data in SASRec test from the repo and put it to temp
Signed-off-by: miguelgfierro <miguelgfierro@users.noreply.github.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
This reverts commit 56e470e. Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
…elens Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
Signed-off-by: ds-wook <leewook94@gmail.com>
feat: add lightgbm ranker
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related Issues
References
Checklist:
git commit -s -m "your commit message".staging branchAND NOT TOmain branch.