Skip to content

jordantete/OddsHarvester

OddsHarvester

Scrape sports betting odds from OddsPortal.com with ease

Extract upcoming & historical odds across 8 sports, 100+ leagues, and dozens of betting markets.
Powered by Playwright browser automation. Output to JSON, CSV, or S3.


PyPI version License: MIT Build Status Scraper Health codecov Python


Quick Start

# Install
pip install oddsharvester

# Or clone & setup with uv
git clone https://github.com/jordantete/OddsHarvester.git && cd OddsHarvester
pip install uv && uv sync

# Scrape upcoming football matches
oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# Scrape historical Premier League odds
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2 --headless

Features

Feature Description
Upcoming Scrape upcoming matches Fetch odds and event details for upcoming sports matches by date or league
Historic Scrape historical odds Retrieve past odds and match results for any season
Multi-market Advanced parsing Structured data: dates, teams, scores, venues, and per-bookmaker odds
Storage Flexible output JSON, CSV (local), or direct upload to AWS S3
Docker Container-ready Run seamlessly in Docker with environment variable configuration
Proxy Proxy support Route through SOCKS/HTTP proxies for geolocation and anti-blocking

Supported Sports & Markets

Sport Markets
⚽ Football 1x2 btts double_chance draw_no_bet over/under european_handicap asian_handicap
🎾 Tennis match_winner total_sets_over/under total_games_over/under asian_handicap exact_score
πŸ€ Basketball 1x2 moneyline asian_handicap over/under
πŸ‰ Rugby League 1x2 home_away double_chance draw_no_bet over/under handicap
πŸ‰ Rugby Union 1x2 home_away double_chance draw_no_bet over/under handicap
πŸ’ Ice Hockey 1x2 home_away double_chance draw_no_bet btts over/under
⚾ Baseball moneyline over/under
🏈 American Football 1x2 moneyline over/under asian_handicap

100+ leagues supported across all sports β€” Premier League, La Liga, Serie A, NBA, NFL, MLB, NHL, ATP/WTA Grand Slams, and many more.


CLI Usage

OddsHarvester has two main commands: upcoming and historic. They share most options, with a few command-specific ones.

oddsharvester upcoming

Scrape odds for upcoming matches β€” by date, by league, or by specific match URL.

# By date
oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# By league (scrapes all upcoming matches for that league)
oddsharvester upcoming -s football -l england-premier-league -m 1x2,btts --headless

# Multiple leagues
oddsharvester upcoming -s football -l england-premier-league,spain-laliga -m 1x2 --headless

# Specific match URLs
oddsharvester upcoming -s football --match-link "https://www.oddsportal.com/football/..." -m 1x2

# Preview mode (faster β€” average odds only, no individual bookmakers)
oddsharvester upcoming -s football -d 20250301 -m over_under --preview-only --headless

oddsharvester historic

Scrape historical odds and results for past seasons.

# Single league & season
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --headless

# Current season
oddsharvester historic -s football -l england-premier-league --season current -m 1x2 --headless

# Limit pagination
oddsharvester historic -s football -l england-premier-league --season 2022-2023 -m 1x2 --max-pages 3 --headless

# Output as CSV
oddsharvester historic -s football -l england-premier-league --season 2024-2025 -m 1x2 -f csv -o premier_league_odds --headless

CLI Options Reference

Core Options

Option Short Description Default
--sport -s Sport to scrape (football, tennis, basketball, etc.) required
--date -d Target date in YYYYMMDD format β€”
--league -l Comma-separated league slugs (e.g. england-premier-league) β€”
--market -m Comma-separated markets (e.g. 1x2,btts) β€”
--match-link Specific match URL (repeatable). Overrides --sport, --date, --league β€”

upcoming only: --date is required unless --league or --match-link is provided. When --league is set, --date is ignored.

historic only:

Option Description Default
--season Season: YYYY, YYYY-YYYY, or current required
--max-pages Max number of result pages to scrape unlimited

Output Options

Option Short Description Default
--storage local or remote (S3) local
--format -f json or csv json
--output -o Output file path scraped_data

Browser & Scraping Options

Option Short Description Default
--headless Run browser in headless mode False
--concurrency -c Concurrent scraping tasks 3
--request-delay Delay (sec) between match requests 1.0
--user-agent Custom browser user agent β€”
--locale Browser locale (e.g. fr-BE) β€”
--timezone Browser timezone (e.g. Europe/Brussels) β€”

Proxy Options

Option Description
--proxy-url Proxy URL (http://... or socks5://...)
--proxy-user Proxy username
--proxy-pass Proxy password

Tip: For best results, match --locale and --timezone to your proxy's region.

Advanced Options

Option Description Default
--target-bookmaker Filter odds for a specific bookmaker β€”
--odds-history Include historical odds movement per match False
--odds-format Odds display format Decimal Odds
--preview-only Fast mode β€” average odds only, no bookmaker details False
--bookies-filter Bookmaker filter: all, classic, or crypto all
--period Match period (sport-specific: full-time, halves, etc.) sport default
Preview Mode vs Full Mode
Aspect Full Mode Preview Mode
Speed Slower (interactive) Faster (passive)
Data All submarkets + bookmakers Visible submarkets + avg odds
Bookmakers Individual bookmaker odds Average odds only
Odds History Available Not available
Structure By bookmaker By submarket (avg odds)

Preview mode (--preview-only) is useful for quick exploration, testing data format, or light monitoring with reduced resource usage.


Environment Variables

All CLI options can be set via environment variables β€” useful for Docker or CI/CD.

View all environment variables
Variable CLI Option Description
OH_SPORT --sport Sport to scrape
OH_LEAGUES --league Comma-separated leagues
OH_MARKETS --market Comma-separated markets
OH_STORAGE --storage Storage type (local/remote)
OH_FORMAT --format Output format (json/csv)
OH_FILE_PATH --output Output file path
OH_HEADLESS --headless Run in headless mode
OH_CONCURRENCY --concurrency Number of concurrent tasks
OH_REQUEST_DELAY --request-delay Delay between requests (sec)
OH_PROXY_URL --proxy-url Proxy server URL
OH_PROXY_USER --proxy-user Proxy username
OH_PROXY_PASS --proxy-pass Proxy password
OH_USER_AGENT --user-agent Custom browser user agent
OH_LOCALE --locale Browser locale
OH_TIMEZONE --timezone Browser timezone ID
export OH_SPORT=football
export OH_HEADLESS=true
export OH_PROXY_URL=http://proxy.example.com:8080

oddsharvester upcoming -d 20250301 -m 1x2

Installation

With pip (from PyPI)

pip install oddsharvester

From source (with uv)

git clone https://github.com/jordantete/OddsHarvester.git
cd OddsHarvester
pip install uv
uv sync
Manual setup (venv + pip or poetry)
python3 -m venv .venv
source .venv/bin/activate    # Unix/macOS
# .venv\Scripts\activate     # Windows

pip install . --use-pep517
# or: poetry install

Verify installation:

oddsharvester --help

Docker

# Build
docker build -t odds-harvester:local --target local-dev .

# Run
docker run --rm odds-harvester:local \
  python3 -m oddsharvester upcoming -s football -d 20250301 -m 1x2 --headless

# Or with environment variables
docker run --rm \
  -e OH_SPORT=football \
  -e OH_HEADLESS=true \
  odds-harvester:local python3 -m oddsharvester upcoming -d 20250301 -m 1x2

Cloud Deployment (AWS Lambda + Serverless)

OddsHarvester can be deployed on AWS Lambda using the Serverless Framework with a Docker image (Playwright exceeds Lambda's 50MB deployment limit).

Setup:

  1. Build the Docker image and push to ECR
  2. Configure serverless.yaml at the project root:
    • Set your AWS region, S3 bucket ARN, and IAM permissions
    • Default function: scanAndStoreOddsPortalDataV2 (2048MB, 360s timeout)
    • Triggers via EventBridge every 2 hours by default
  3. Deploy:
sls deploy

Refer to the Serverless Framework docs for detailed setup instructions.


Contributing

Contributions are welcome! Submit an issue or pull request. Please follow the project's coding standards and include clear descriptions for any changes.

License

MIT License

Disclaimer

This package is intended for educational purposes only. The author is not affiliated with or endorsed by oddsportal.com. Use responsibly and ensure compliance with their terms of service and applicable laws.

About

A python app designed to scrape and process sports betting data directly from oddsportal.com 🎯

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

Contributors