Skip to content

blghtr/plastinka_sales_predictor

Repository files navigation

Plastinka Sales Predictor

A comprehensive machine learning system for predicting vinyl record sales using advanced time-series forecasting with cloud-based deployment capabilities.

🎯 Overview

Plastinka Sales Predictor is a production-ready ML system designed to provide accurate vinyl sales forecasts. It combines:

  • 🧠 Advanced ML Models: Utilizing state-of-the-art time-series forecasting models (TiDE).
  • ☁️ Cloud Integration: Seamless integration with Yandex DataSphere for scalable model training and deployment.
  • 🚀 Production API: A robust FastAPI-based REST API for interacting with the forecasting system.
  • 🏗️ Infrastructure as Code: Automated infrastructure management with Terraform for consistent and reproducible deployments.

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    Plastinka Sales Predictor System                         │
│                                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐          │
│  │   ML Module     │    │   FastAPI App   │    │  Infrastructure │          │
│  │ (ML Code)       │    │ (Orchestrator)  │    │ (IaC)           │          │
│  │ • TiDE Model    │    │ • REST API      │    │ • Terraform     │          │
│  │ • Data Prep     │    │ • Job Queue     │    │ • DataSphere    │          │
│  │ • Metrics       │    │ • Database      │    │ • Monitoring    │          │
│  │ • Training      │    │ • File Storage  │    │                 │          │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘          │
│           │                       │                       │                 │
│           └───────────────────────┼───────────────────────┘                 │
│                                   │                                         │
└───────────────────────────────────┼─────────────────────────────────────────┘
                                    │
                        ┌───────────▼───────────┐
                        │   Yandex DataSphere   │
                        │                       │
                        │ • ML Compute          │
                        │ • Model Training      │
                        │ • Prediction Gen      │
                        │ • Resource Scaling    │
                        └───────────────────────┘

Components and their interaction

  • FastAPI App (deployment/): The central orchestrator component. It provides a REST API for user interaction, manages the database (metadata, tasks, results), and initiates tasks (training, tuning) in Yandex DataSphere.
  • ML Module (plastinka_sales_predictor/): This is not a separate service, but a Python package containing all the machine learning logic (data preparation, TiDE model architecture, metrics). This code is packaged and executed directly in the Yandex DataSphere cloud environment.
  • Infrastructure (deployment/infrastructure/): Infrastructure as code (IaC) based on Terraform. These configurations describe and create all the necessary cloud resources in Yandex Cloud, including the DataSphere project and access rights. This is a component of the deployment stage.

🚀 Getting Started

This guide provides the steps to set up the project from scratch.

1. Prerequisites

  • Python & uv: Ensure you have Python 3.x installed. This project uses uv for package management. Install it with:
    # macOS / Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    # Windows
    powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
  • Terraform / OpenTofu: Install either Terraform or OpenTofu to manage cloud infrastructure.
  • Yandex Cloud Account: You need a Yandex Cloud account with an organization and folder.

2. Set Up Infrastructure

Terraform automatically creates the cloud resources and a .env file in the project root with the necessary API keys.

# 1. Navigate to the infrastructure configuration directory
cd deployment/infrastructure/envs/prod

# 2. Create a variables file from the example
cp terraform.tfvars.example terraform.tfvars

# 3. Edit terraform.tfvars with your Yandex Cloud IDs
# (yc_cloud_id, yc_folder_id, yc_organization_id)

# 4. Set your Yandex Cloud OAuth token as an environment variable
export TF_VAR_yc_token="your-yc-oauth-token"

# 5. Initialize and apply the Terraform configuration
terraform init
terraform apply

This will create the necessary DataSphere resources and populate the .env file at the project root.

3. Install Dependencies

Once the infrastructure is ready, install the Python dependencies.

# From the project root
uv sync

4. Run the Application

Start the FastAPI server:

python deployment/run.py

The API will be available at http://127.0.0.1:8000, and the documentation can be found at http://127.0.0.1:8000/docs.


🔄 Core Workflow

The system is designed around a monthly operational cycle. This workflow is the primary use case for the API.

  1. Upload Data: At the beginning of a new month, upload the sales data for the past month and the current stock levels.
    • POST /api/v1/jobs/data-upload
  2. Monitor Upload Job: Track the status of the data processing job.
    • GET /api/v1/jobs/{job_id}
  3. Check System Health (Optional but Recommended): Verify that the data is consistent and the system is ready for training.
    • GET /health
  4. Trigger Model Training: Start a new training job in Yandex DataSphere. This process trains the model on the complete, updated dataset and generates predictions for the next period.
    • POST /api/v1/jobs/training
  5. Monitor Training Job: Track the training process, which can take a significant amount of time (e.g., ~2 hours).
    • GET /api/v1/jobs/{job_id}
  6. Trigger Hyperparameter Tuning (If Needed): If the model performance degrades over time (indicated by the /health endpoint), run a tuning job to find better hyperparameters. After tuning, you must re-run the training job (Step 4).
    • POST /api/v1/jobs/tuning
  7. Retrieve Report: Once training is complete, fetch the prediction report.
    • POST /api/v1/jobs/reports

📚 Documentation

This project is composed of several key components, each with its own detailed documentation.

  • Deployment README: (START HERE FOR API USAGE) Detailed guide on the FastAPI application, including API endpoints, data requirements, business logic, and practical examples.
  • Infrastructure README: Comprehensive guide on setting up and managing the cloud infrastructure with Terraform.
  • ML Module (plastinka_sales_predictor/): The core Python package containing the ML code (model, data processing, etc.). The code is the primary documentation.

🧪 Testing

To run the test suite:

pytest

📄 License

This project is licensed under the CC BY-NC-SA 4.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors