Overhaul the devops side of the project.#88
Overhaul the devops side of the project.#88akeaswaran merged 3 commits intosportsdataverse:0.36-livefrom
Conversation
|
@a5ehren is attempting to deploy a commit to the sportsdataverse Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThe pull request modernizes packaging infrastructure by migrating from setup.py to pyproject.toml, updates GitHub Actions workflows to newer versions, and introduces defensive error handling and JSON parsing safeguards across multiple sportsdataverse modules. Additionally, it adds comprehensive test coverage for CFB data loaders and removes legacy example files while updating dependencies to newer versions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
sportsdataverse/dl_utils.py (1)
8-8: HTTP error handling inESPNHTTP.send_api_requestis safer; consider refining exceptionsWrapping
requests.getin a try/except and returning anESPNResponsesentinel (or raising whenraise_exception_on_error=True) is a solid improvement.Two refinements to consider:
- Define a library-specific exception (e.g.
ESPNHTTPError) and chain the original error:+class ESPNHTTPError(Exception): + """ESPN HTTP request failed.""" + @@ - except requests.exceptions.RequestException as e: - print(f"Request error for {base_url}: {e}") - # Return empty response with error status - data = self.espn_response(response="", status_code=500, url=base_url) - if raise_exception_on_error: - raise Exception(f'RequestError: Failed to fetch data from {base_url}: {e}') - return data + except requests.exceptions.RequestException as e: + print(f"Request error for {base_url}: {e}") + data = self.espn_response(response="", status_code=500, url=base_url) + if raise_exception_on_error: + raise ESPNHTTPError( + f"Failed to fetch data from {base_url}" + ) from e + return data
- Optionally, give
timeouta non‑None default (e.g. 10–30 seconds) so a hung endpoint can’t block indefinitely.These are quality-of-life improvements rather than blockers.
Also applies to: 174-186
tests/cfb/test_loaders.py (1)
24-32: Tighten multi-season assertions and consider relaxing message matchingTwo minor robustness points in these tests:
Multi-season checks (
load_cfb_pbp/load_cfb_schedule)
The comment says “Should have data from both seasons”, but the assertion is:assert result['season'].nunique() >= 1If both seasons are genuinely expected to be present in the underlying data,
>= 2would better reflect that intent. Otherwise, updating the comment to something like “at least one season present” would avoid confusion.SeasonNotFoundError message matching
The invalid-season tests assert on specific message substrings withmatch="season cannot be less than ..."etc. That makes tests brittle to harmless wording changes in the error messages. You might want to either:
- Drop
match=...and assert only onSeasonNotFoundError, or- Match a shorter, stable fragment (e.g., just the minimum year number).
Also applies to: 64-72, 40-43, 80-83, 117-120, 154-157
sportsdataverse/cfb/cfb_pbp.py (1)
152-161: Improve exception chaining and logging in new ESPN summary/odds handlingThe new guards around ESPN summary and odds responses are a solid robustness improvement, but a couple of small tweaks would make them cleaner and friendlier to tooling:
Chain the original JSON error in
espn_cfb_pbp
Ruff B904 is right that preserving the original exception helps distinguish JSON issues from handler bugs. You can do:- try: - summary = json.loads(summary_resp) - except (json.JSONDecodeError, ValueError) as e: - raise ValueError(f"Failed to decode JSON from {summary_url}: {e}")
try:summary = json.loads(summary_resp)except (json.JSONDecodeError, ValueError) as e:raise ValueError(f"Failed to decode JSON from {summary_url}") from e2. **Avoid `print` in `__helper__espn_cfb_odds_information__` and centralize defaults** Library code generally shouldn’t write to stdout. Since you already import `logging`, consider: ```diff - if odds_resp is None: - # spread, overUnder, homeFavorite, gameSpreadAvailable - return (2.5, 55.5, True, False) + DEFAULT_ODDS = (2.5, 55.5, True, False) + if odds_resp is None: + # spread, overUnder, homeFavorite, gameSpreadAvailable + return DEFAULT_ODDS ... - except (json.JSONDecodeError, ValueError) as e: - print(f"Error decoding odds JSON from {odds_url}: {e}") - return (2.5, 55.5, True, False) + except (json.JSONDecodeError, ValueError) as e: + logging.getLogger(__name__).warning( + "Error decoding odds JSON from %s: %s", odds_url, e + ) + return DEFAULT_ODDSand reuse
DEFAULT_ODDSfor the empty-items branch as well.These changes keep the new behavior but make errors easier to trace and keep the library quiet unless a logger is configured.
Also applies to: 225-241
sportsdataverse.egg-info/PKG-INFO (1)
3-3: Consider standard semantic versioning format.The version
0.0.36.2.32uses four segments, which deviates from standard semantic versioning (major.minor.patch). While not incorrect, this may cause compatibility issues with tools expecting three-segment versions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (21)
.github/workflows/python-publish.yml(1 hunks).github/workflows/tests.yml(1 hunks).gitignore(1 hunks).python-version(1 hunks)examples/test_mlbam.py(0 hunks)pyproject.toml(1 hunks)setup.py(0 hunks)sportsdataverse.egg-info/PKG-INFO(1 hunks)sportsdataverse.egg-info/SOURCES.txt(1 hunks)sportsdataverse.egg-info/requires.txt(1 hunks)sportsdataverse.egg-info/top_level.txt(1 hunks)sportsdataverse/cfb/cfb_pbp.py(4 hunks)sportsdataverse/cfb/cfb_schedule.py(3 hunks)sportsdataverse/cfb/cfb_teams.py(1 hunks)sportsdataverse/dl_utils.py(2 hunks)sportsdataverse/nfl/nfl_games.py(2 hunks)sportsdataverse/nfl/nfl_pbp.py(12 hunks)tests/cfb/test_loaders.py(1 hunks)tests/cfb/test_pbp.py(4 hunks)tests/cfb/test_schedule.py(1 hunks)tests/cfb/test_teams.py(1 hunks)
💤 Files with no reviewable changes (2)
- examples/test_mlbam.py
- setup.py
🧰 Additional context used
🧬 Code graph analysis (6)
sportsdataverse/cfb/cfb_schedule.py (1)
sportsdataverse/dl_utils.py (1)
underscore(67-86)
tests/cfb/test_schedule.py (1)
sportsdataverse/cfb/cfb_schedule.py (3)
espn_cfb_schedule(7-118)espn_cfb_calendar(122-184)most_recent_cfb_season(186-193)
tests/cfb/test_pbp.py (2)
sportsdataverse/cfb/cfb_pbp.py (2)
espn_cfb_pbp(133-217)run_processing_pipeline(5203-5280)sportsdataverse/nfl/nfl_pbp.py (1)
run_processing_pipeline(5248-5323)
tests/cfb/test_teams.py (1)
sportsdataverse/cfb/cfb_teams.py (1)
espn_cfb_teams(5-37)
sportsdataverse/cfb/cfb_teams.py (1)
sportsdataverse/dl_utils.py (1)
underscore(67-86)
tests/cfb/test_loaders.py (2)
sportsdataverse/cfb/cfb_loaders.py (5)
load_cfb_pbp(9-35)load_cfb_schedule(37-64)load_cfb_rosters(66-92)load_cfb_team_info(94-123)get_cfb_teams(125-138)sportsdataverse/errors.py (1)
SeasonNotFoundError(5-6)
🪛 Ruff (0.14.7)
sportsdataverse/nfl/nfl_games.py
18-18: Probable use of requests call without timeout
(S113)
21-21: Consider moving this statement to an else block
(TRY300)
24-24: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
24-24: Avoid specifying long messages outside the exception class
(TRY003)
27-27: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
27-27: Avoid specifying long messages outside the exception class
(TRY003)
66-66: Probable use of requests call without timeout
(S113)
71-71: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
74-74: Avoid specifying long messages outside the exception class
(TRY003)
tests/cfb/test_pbp.py
593-593: Local variable json_dict_stuff is assigned to but never used
Remove assignment to unused variable json_dict_stuff
(F841)
sportsdataverse/cfb/cfb_pbp.py
156-156: Avoid specifying long messages outside the exception class
(TRY003)
161-161: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
161-161: Avoid specifying long messages outside the exception class
(TRY003)
sportsdataverse/nfl/nfl_pbp.py
4794-4794: Avoid equality comparisons to False; use not play_df["sp"]: for false checks
Replace with not play_df["sp"]
(E712)
4794-4794: Avoid equality comparisons to True; use play_df["scrimmage_play"]: for truth checks
Replace with play_df["scrimmage_play"]
(E712)
4803-4803: Avoid equality comparisons to False; use not play_df["sp"]: for false checks
Replace with not play_df["sp"]
(E712)
4803-4803: Avoid equality comparisons to True; use play_df["scrimmage_play"]: for truth checks
Replace with play_df["scrimmage_play"]
(E712)
sportsdataverse/dl_utils.py
185-185: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
185-185: Create your own exception
(TRY002)
185-185: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (19)
.gitignore (1)
21-21: Ignore.coveragein VCSIgnoring coverage artifacts is standard and avoids committing local test outputs. Change looks good.
.python-version (1)
1-1: Default runtime 3.13 is reasonable; confirm ecosystem supportPinning
.python-versionto 3.13 aligns with the test matrix (3.9and3.x) and a>=3.9requirement. Just ensure your dependencies and tooling are validated on 3.13 as well..github/workflows/tests.yml (1)
10-29: Matrixed tests across 3.9 and latest 3.x look correctThe updated matrix and action versions correctly exercise the minimum supported Python (3.9) and the latest 3.x, and the install/test steps remain straightforward.
sportsdataverse/cfb/cfb_teams.py (1)
20-36: Defensive teams JSON handling and conditional normalization look solidInitializing
teamsto an empty DataFrame, safely walkingsports/leagues/teamsvia.get(..., [{}]), and only normalizing/renaming columns whenteamsis non‑empty gives the function a predictable “empty DataFrame on error” behavior and avoids crashes on bad API responses.sportsdataverse/nfl/nfl_pbp.py (3)
7-7: Usingimportlib.resources.filesfor model paths is a good modernizationResolving
ep_model,wp_spread, andqbr_modelviafiles("sportsdataverse").joinpath("nfl/models/...")is the right approach for pyproject-based packages and works with zipped/wheel installs, as long as those model files are included as package data.Also applies to: 21-23
3191-3567: Indexingstr.extract(...)[0]correctly normalizes extracted player columnsUpdating the various
play_df.text.str.extract(...)calls in__add_player_colsto index[0]ensures a 1‑D Series is assigned into each column (instead of a DataFrame), which aligns with pandas’ API and avoids shape mismatches. The regex patterns themselves are unchanged, so behavior is preserved with improved compatibility.
4790-4809: Resetting index after grouped cumsums prevents MultiIndex assignment issuesThe added
.reset_index(level=0, drop=True)on the grouped cumsums fordrive_play_index,prog_drive_EPA,prog_drive_WPA, anddrive_total_yardscorrectly flattens the MultiIndex produced bygroupby(...).apply(...), ensuring these Series align withplay_df’s index and avoiding assignment errors on newer pandas..github/workflows/python-publish.yml (1)
16-32: Publish workflow may need update if packaging migration to pyproject.toml is plannedIf
setup.pyis being removed as part of a modernization topyproject.toml, this workflow will fail on release. Verify whether this migration is in progress and, if so, update the build step accordingly.Consider switching to a pyproject-aware builder:
- - name: Install dependencies - run: | - python -m pip install --upgrade pip - pip install setuptools wheel twine + - name: Install build tooling + run: | - python -m pip install --upgrade pip + pip install --upgrade pip + pip install build twine - name: Build and publish env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} run: | - python setup.py sdist bdist_wheel + python -m build twine upload dist/*sportsdataverse/nfl/nfl_games.py (1)
17-27: Add explicit timeouts and chain underlying exceptions in NFL HTTP helpersThe new guarded flows in
nfl_token_genandnfl_game_detailsare a big improvement, but two reliability issues remain:
- Both
requests.request(...)andrequests.get(...)are invoked without a timeout, so a hung NFL endpoint can block the caller indefinitely.- Re‑raising as
ValueError(...)withoutfrom eloses the original stack/context.Consider something along these lines:
-def nfl_token_gen(): +def nfl_token_gen(timeout: float = 10.0): @@ - try: - response = requests.request("POST", url, headers=headers, data = payload) + try: + response = requests.request("POST", url, headers=headers, data=payload, timeout=timeout) response.raise_for_status() access_token = json.loads(response.content)['access_token'] return access_token except requests.exceptions.RequestException as e: print(f"Error fetching NFL token from {url}: {e}") - raise ValueError(f"Failed to generate NFL API token: {e}") + raise ValueError("Failed to generate NFL API token") from e @@ -def nfl_game_details(game_id=None, headers=None, raw=False) -> Dict: +def nfl_game_details(game_id=None, headers=None, raw=False, timeout: float = 10.0) -> Dict: @@ - try: - summary_resp = requests.get(summary_url, headers=headers) + try: + summary_resp = requests.get(summary_url, headers=headers, timeout=timeout) summary_resp.raise_for_status() summary = summary_resp.json() except requests.exceptions.RequestException as e: print(f"Error fetching NFL game details from {summary_url}: {e}") - raise ValueError(f"Failed to fetch NFL game details: {e}") + raise ValueError("Failed to fetch NFL game details") from e except json.JSONDecodeError as e: print(f"Error parsing NFL game details JSON: {e}") - raise ValueError(f"Failed to parse NFL game details: {e}") + raise ValueError("Failed to parse NFL game details") from eThis keeps the public API essentially the same while preventing indefinite hangs and improving debuggability.
Also applies to: 65-74
sportsdataverse.egg-info/requires.txt (1)
1-30: Runtime and extra dependency pins look consistent with pyprojectThe updated requirements mirror the pyproject dependency and extras configuration; no issues from a packaging standpoint as long as both stay in sync.
pyproject.toml (1)
1-71: pyproject configuration is coherent and matches the code’s needsBuild backend, metadata, runtime deps, extras, and package-data all align with the module usage (including model files); this is a solid modernization from setup.py.
sportsdataverse.egg-info/top_level.txt (1)
1-5: top_level entries are reasonable for current repo layoutIncluding docs, examples, and build artifacts here is consistent with the current source tree and shouldn’t affect runtime behavior.
sportsdataverse.egg-info/SOURCES.txt (1)
4-16: Updated SOURCES manifest matches the new packaging and examplesIncluding
pyproject.toml, Sphinx config, and example scripts here aligns with the move to modern packaging and expanded examples; nothing problematic.sportsdataverse/cfb/cfb_pbp.py (1)
7-8: Switch toimportlib.resources.filesfor model paths looks goodResolving
ep_model.model,wp_spread.model, andqbr_model.modelviafiles("sportsdataverse").joinpath(...)is an appropriate replacement for pkg_resources and works with the package-data entries in pyproject; loading them into global Boosters at import time preserves previous behavior.Also applies to: 22-33
sportsdataverse/cfb/cfb_schedule.py (1)
136-184: LGTM: Comprehensive error handling added.The defensive JSON parsing and error handling for both branches (ondays and non-ondays) is well implemented. The code properly guards against None responses, JSON decoding errors, and missing keys, with appropriate fallbacks to empty DataFrames.
tests/cfb/test_teams.py (1)
1-87: LGTM: Comprehensive test coverage for espn_cfb_teams.The test suite provides excellent coverage with tests for:
- Different parameter combinations (default, FBS, FCS)
- DataFrame structure validation
- Column naming conventions
- Data quality checks (uniqueness, non-null values)
The tests are well-structured and use appropriate assertions.
tests/cfb/test_schedule.py (2)
49-55: Good defensive testing pattern.The try/except block appropriately handles potential ESPN API failures or missing data for FCS schedules, using
pytest.skipto document the known issue rather than failing the test.
162-172: LGTM: Comprehensive logic validation.The test properly validates the season determination logic based on the current date (after August 15 or September → current year, otherwise → previous year).
sportsdataverse.egg-info/PKG-INFO (1)
31-31: The attrs package version 25.3.0 has been released and is stable. No action required.
98fcf34 to
9396af5
Compare
f22e055 to
0ee530b
Compare
|
Thanks for submitting this -- sorry it's taken so long for us to get to it. We'll probably merge this in after the season along with updates to the Game on Paper Docker containers to entirely support Python 3.14. |
|
Sounds good. If you give me a heads up right before I can refresh the PR. |
|
@a5ehren think we're good to merge this now -- just bump the version number and we should be all set! |
*Convert project to pyproject.toml setup, via uv *Set min python version to 3.9 *Fix some minor package deprecations in the near future *Update all dependency minimums to 3.9 resolved values *Add significant test coverage for CFB code *Add graceful error handling for web APIs *Update github actions to test multiple versions of python *Fix bugs from the new tests *Fix bugs against python 3.14
|
Awesome, I'll rebase and make sure everything still works. |
Fix failing CFB play-by-play tests due to stale test data and ESPN API
changes, and eliminate all FutureWarnings and SyntaxWarnings to ensure
compatibility with upcoming pandas and numpy versions.
- **test_yards_per_drive**: Correct expected play count from 62 to 61
- **test_25_weird_format_penalty**: Add bool() conversion for change_of_poss
comparison and correct expected value for fumble recovery play
- **test_25_weird_yardage_totals**: Update team ID (194→213) and expected
yardage (332→353) to match current game data
- **test_mismarked_punt_block_td**: Update play text format to match current
ESPN API response format
- Replace function references with strings in .agg() calls:
- 126 instances: `, sum)` → `, "sum")`
- 78 instances: `, np.mean)` → `, "mean")`
- Fix fillna downcasting warnings:
- Replace `.fillna(method="ffill")` with `.ffill()`
- Replace `.fillna(method="bfill")` with `.bfill()`
- Add `.infer_objects(copy=False)` after fillna calls (6 instances)
- Set `pd.set_option('future.no_silent_downcasting', True)`
- Convert regex pattern strings to raw strings (add `r` prefix):
- Time patterns: `"^\(\d{1,2}:\d{2}\)"` → `r"^\(\d{1,2}:\d{2}\)"`
- Whitespace patterns: `"\s+"` → `r"\s+"`
- Word patterns: `"y\w*ds?"` → `r"y\w*ds?"`
- Parenthesis patterns: `" pass \(\w"` → `r" pass \(\w"`
- Remove deprecated `from numpy.core.fromnumeric import mean` import
- Replace function references with strings in .agg() calls:
- 115 instances: `, sum)` → `, "sum")`
- 71 instances: `, mean)` → `, "mean")`
- Set `pd.set_option('future.no_silent_downcasting', True)`
- All tests pass: 5 passed
- Warning reduction: 376 → 290 (86 fewer warnings)
- Remaining warnings are informational PerformanceWarnings only
- Code is future-proof for pandas 3.0+ and numpy 2.0+
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
6307b0d to
e0c32a3
Compare
| uv sync --locked --all-extras --dev | ||
|
|
||
| # Install the package in editable mode | ||
| pip install -e .[all] |
There was a problem hiding this comment.
Looks like this step fails for me? I can get the package installed in editable mode with pip install -e . but pip tells me there is no dependency group marked all to install (even though I see it in pyproject.toml).
There was a problem hiding this comment.
Update: was a zsh error. Just needed quotes around .[all] and worked just fine. We may want to document that here just in case
|
Not sure what’s going on with the CI runs
…On Sat, Jan 24, 2026 at 10:05 AM Akshay Easwaran ***@***.***> wrote:
***@***.**** approved this pull request.
Everything else looks good + tests pass locally for me. Really appreciate
the work here. Will merge after we update the README.
—
Reply to this email directly, view it on GitHub
<#88 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG4XCBZ3IMA6DAOCZCFZZVT4IOC2HAVCNFSM6AAAAACLZ4TNI2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTOMBRHAYDAOBRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Think we need to expand python version |
|
This might also be a good time to bump to 0.0.36.3.0, but I'll leave that up to you. (realistically you should just fork the project from the 0.36-live branch to something else, but 🤷 ) |
|
Approved -- I'll merge this in and then bump the version. Thanks for the help here! |
*Convert project to pyproject.toml setup, via uv
*Set min python version to 3.9
*Fix some minor package deprecations in the near future
*Update all dependency minimums to 3.9 resolved values
*Add significant test coverage for CFB code
*Add graceful error handling for web APIs
*Update github actions to test multiple versions of python
*Fix bugs from the new tests
*Fix bugs against python 3.14
Summary by CodeRabbit
Bug Fixes
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.