A comprehensive machine learning system for predicting vinyl record sales using advanced time-series forecasting with cloud-based deployment capabilities.
Plastinka Sales Predictor is a production-ready ML system designed to provide accurate vinyl sales forecasts. It combines:
- 🧠 Advanced ML Models: Utilizing state-of-the-art time-series forecasting models (TiDE).
- ☁️ Cloud Integration: Seamless integration with Yandex DataSphere for scalable model training and deployment.
- 🚀 Production API: A robust FastAPI-based REST API for interacting with the forecasting system.
- 🏗️ Infrastructure as Code: Automated infrastructure management with Terraform for consistent and reproducible deployments.
┌─────────────────────────────────────────────────────────────────────────────┐
│ Plastinka Sales Predictor System │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ML Module │ │ FastAPI App │ │ Infrastructure │ │
│ │ (ML Code) │ │ (Orchestrator) │ │ (IaC) │ │
│ │ • TiDE Model │ │ • REST API │ │ • Terraform │ │
│ │ • Data Prep │ │ • Job Queue │ │ • DataSphere │ │
│ │ • Metrics │ │ • Database │ │ • Monitoring │ │
│ │ • Training │ │ • File Storage │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ └───────────────────────┼───────────────────────┘ │
│ │ │
└───────────────────────────────────┼─────────────────────────────────────────┘
│
┌───────────▼───────────┐
│ Yandex DataSphere │
│ │
│ • ML Compute │
│ • Model Training │
│ • Prediction Gen │
│ • Resource Scaling │
└───────────────────────┘
- FastAPI App (
deployment/): The central orchestrator component. It provides a REST API for user interaction, manages the database (metadata, tasks, results), and initiates tasks (training, tuning) in Yandex DataSphere. - ML Module (
plastinka_sales_predictor/): This is not a separate service, but a Python package containing all the machine learning logic (data preparation, TiDE model architecture, metrics). This code is packaged and executed directly in the Yandex DataSphere cloud environment. - Infrastructure (
deployment/infrastructure/): Infrastructure as code (IaC) based on Terraform. These configurations describe and create all the necessary cloud resources in Yandex Cloud, including the DataSphere project and access rights. This is a component of the deployment stage.
This guide provides the steps to set up the project from scratch.
- Python &
uv: Ensure you have Python 3.x installed. This project usesuvfor package management. Install it with:# macOS / Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
- Terraform / OpenTofu: Install either Terraform or OpenTofu to manage cloud infrastructure.
- Yandex Cloud Account: You need a Yandex Cloud account with an organization and folder.
Terraform automatically creates the cloud resources and a .env file in the project root with the necessary API keys.
# 1. Navigate to the infrastructure configuration directory
cd deployment/infrastructure/envs/prod
# 2. Create a variables file from the example
cp terraform.tfvars.example terraform.tfvars
# 3. Edit terraform.tfvars with your Yandex Cloud IDs
# (yc_cloud_id, yc_folder_id, yc_organization_id)
# 4. Set your Yandex Cloud OAuth token as an environment variable
export TF_VAR_yc_token="your-yc-oauth-token"
# 5. Initialize and apply the Terraform configuration
terraform init
terraform applyThis will create the necessary DataSphere resources and populate the .env file at the project root.
Once the infrastructure is ready, install the Python dependencies.
# From the project root
uv syncStart the FastAPI server:
python deployment/run.pyThe API will be available at http://127.0.0.1:8000, and the documentation can be found at http://127.0.0.1:8000/docs.
The system is designed around a monthly operational cycle. This workflow is the primary use case for the API.
- Upload Data: At the beginning of a new month, upload the sales data for the past month and the current stock levels.
POST /api/v1/jobs/data-upload
- Monitor Upload Job: Track the status of the data processing job.
GET /api/v1/jobs/{job_id}
- Check System Health (Optional but Recommended): Verify that the data is consistent and the system is ready for training.
GET /health
- Trigger Model Training: Start a new training job in Yandex DataSphere. This process trains the model on the complete, updated dataset and generates predictions for the next period.
POST /api/v1/jobs/training
- Monitor Training Job: Track the training process, which can take a significant amount of time (e.g., ~2 hours).
GET /api/v1/jobs/{job_id}
- Trigger Hyperparameter Tuning (If Needed): If the model performance degrades over time (indicated by the
/healthendpoint), run a tuning job to find better hyperparameters. After tuning, you must re-run the training job (Step 4).POST /api/v1/jobs/tuning
- Retrieve Report: Once training is complete, fetch the prediction report.
POST /api/v1/jobs/reports
This project is composed of several key components, each with its own detailed documentation.
- Deployment README: (START HERE FOR API USAGE) Detailed guide on the FastAPI application, including API endpoints, data requirements, business logic, and practical examples.
- Infrastructure README: Comprehensive guide on setting up and managing the cloud infrastructure with Terraform.
- ML Module (
plastinka_sales_predictor/): The core Python package containing the ML code (model, data processing, etc.). The code is the primary documentation.
To run the test suite:
pytestThis project is licensed under the CC BY-NC-SA 4.0 License.