Anime Recommender System: An end-to-end MLOps application

An end-to-end MLOps project implementing a deep learning-based anime recommendation system with automated CI/CD deployment to Google Kubernetes Engine.

Overview
- Key Features
User Interface & Usage Guide
Architecture
Tech Stack
Project Structure
Model Architecture
Setup & Installation
- Prerequisites
- Local Development
Data Versioning with DVC
CI/CD Pipeline
- Jenkins Setup
- Pipeline Stages
Deployment to GKE
Configuration
Monitoring
API Endpoints
Troubleshooting
Contributing
License
Contact

Overview

This project builds a neural collaborative filtering model to predict anime ratings and provide hybrid personalized recommendations, based on both similar users and similar content. The system is containerized using Docker, versioned with DVC, and automatically deployed to GKE using Jenkins pipelines.

Key Features

Deep Learning Model: Neural network with embedding layers for users and anime
Hybrid recommendation: Use both similar users to find the best existing anime, between more than 14k animes and 300k users
MLOps Pipeline: Automated training, versioning, and deployment
Data Versioning: DVC integration with Google Cloud Storage
CI/CD: Jenkins pipeline for automated builds and deployments
Containerization: Docker with multi-stage builds
Orchestration: Kubernetes deployment on GKE
Experiment Tracking: Comet ML integration for model monitoring

User Interface & Usage Guide

Overview

The Anime Recommender System provides a simple web interface for getting personalized anime recommendations based on your viewing history and preferences.

How to Use

1. Access the Application

Local Development:

http://localhost:8000

Production Deployment:

# Get the external IP from Kubernetes
kubectl get services ml-app-service

# Access via the EXTERNAL-IP shown
http://<EXTERNAL-IP>

2. Get Recommendations

Enter Your Preferance
- Search for your Anime that you have watched before.
- Rate it from 1 to 10, based on your personal opinion
- Select between 5 and 40 anime; more anime ratings will lead to a better answer.
Submit Request
- Click the "Get Recommendations" button
- The system processes your request using the trained neural network
View Results
- The page displays your personalized anime recommendations
- Each recommendation includes:
  - Anime title
  - Predicted rating (0-10 scale)
  - Genre information
  - Brief description or metadata

Screenshots

Main Interface

Clean, minimalist interface for entering anime rating list and requesting recommendations

Recommendation Results

Personalized anime recommendations with predicted ratings and details

API Usage (Programmatic Access)

For developers or automated systems, use the REST API directly.

Endpoint

POST /predict

Request Format

curl -X POST http://your-app-url/predict \
  -H "Content-Type: application/json" \
  -d 'user_ratings = {
    'Attack on Titan': 9,
    'Death Note': 8,
    'One Piece': 1,
    'Naruto': 6,
    'Monster': 2
}'

Response Format

{
  "recommendations": [
    {
      "anime_id": 5114,
      "title": "Fullmetal Alchemist: Brotherhood",
      "predicted_rating": 9.2,
      "genres": ["Action", "Adventure", "Drama"],
      "syn": "This is a demo synopsis..."
    },
    {
      "anime_id": 1535,
      "title": "Death Note",
      "predicted_rating": 8.9,
      "genres": ["Mystery", "Psychological", "Thriller"],
      "syn": "This is a demo synopsis..."
    }
  ],
  "status": "success"
}

Error Response

{
  "error": "Couldn't find a suitable recommended anime list with this list",
  "status": "404"
}

Features

Real-time Predictions: Instant recommendation generation
Personalized Results: Based on your unique viewing patterns
Top-N Recommendations: Configurable number of suggestions
Responsive Design: Works on desktop and mobile devices
RESTful API: Easy integration with other applications

Anime List Information

There is a search engine for anime search, which searches among more than 14k available anime
Anime name format: Each anime name is in (Real name - English name(If exists)) form
You can input at least two first letters of the anime name (Japanese or English name), and the engine searches for available anime

Troubleshooting

Slow response time:

First request may be slower due to the model initialization
Subsequent requests are cached and faster

No recommendations returned:

The user may have an insufficient rating list
Try a different set of anime ratings (You can select more anime)

Privacy & Data

No personal information is collected through the web interface
Only user IDs and anime preferences from the public dataset are used
All data is processed in-memory and not stored persistently

Architecture

├── Data Collection & Processing (Numpy and Pandas)
├── Model Training (TensorFlow/Keras)
├── Model Versioning (DVC)
├── Containerization (Docker)
├── CI/CD Pipeline (Jenkins)
└── Deployment (Google Kubernetes Engine with LoadBalancer)

Tech Stack

ML/Data: Python, TensorFlow/Keras, Pandas, NumPy
MLOps: DVC, Comet ML, Jenkins
Infrastructure: Docker, Kubernetes, Google Cloud Platform (GCR, GKE, GCS)
Web: FastAPI, Vanilla HTML+JS+CSS

Project Structure

.
├── artifacts/                          # Generated artifacts (DVC tracked, gitignored)
│   ├── model/                          # Trained model metadata and configurations
│   ├── processed/                      # Transformed datasets ready for training
│   ├── raw/                            # Original datasets from data ingestion
│   └── weights/                        # Model checkpoints and saved weights
│
├── config/
│   ├── config.yaml                     # Central configuration (hyperparameters, training settings, API keys)
│   └── paths_config.py                 # Path constants and directory configurations
│
├── jenkins_project/
│   └── Dockerfile                      # Jenkins container with Docker-in-Docker setup
│
├── logs/                               # Application and training logs (timestamped)
│
├── notebook/                           # Jupyter notebooks for EDA and experimentation
│
├── pipeline/
│   ├── get_anime_list.py              # Fetch and cache anime metadata from sources
│   ├── prediction_pipeline.py         # Inference pipeline for generating recommendations
│   └── training_pipeline.py           # End-to-end model training orchestration
│
├── src/                                # Core source code modules
│   ├── base_model.py                  # Neural network architecture (embedding + deep layers)
│   ├── custom_exception.py            # Custom exception handling with detailed traceback
│   ├── data_ingestion.py              # Download and load raw data from GCS/sources
│   ├── data_processing.py             # Data cleaning, normalization, and transformation
│   ├── logger.py                      # Logging configuration and utilities
│   ├── model_training.py              # Model training, evaluation, checkpoint management
│   └── suggestion_model.py            # Recommendation generation and ranking logic
│
├── static/                             # Frontend assets for web interface
│   ├── index.html                     # Main web page template
│   └── style.css                      # CSS styling for web interface
│
├── utils/                              # Helper functions and utilities
│   ├── common_functions.py            # General utility functions (file I/O, validation)
│   └── suggestion_functions.py        # Recommendation algorithm helpers
│
├── .dvcignore                          # DVC ignore patterns (excludes from versioning)
├── .gitignore                          # Git ignore patterns (artifacts, logs, __pycache__)
├── app.py                              # Flask application entry point (REST API)
├── deployment.yaml                     # Kubernetes deployment and service manifests
├── Dockerfile                          # Production container image definition
├── Jenkinsfile                         # Jenkins CI/CD pipeline definition (build → deploy)
├── requirements.txt                    # Python dependencies and versions
└── setup.py                            # Package installation configuration (pip install -e .)

Model Architecture

The recommender uses a neural collaborative filtering approach:

Embedding Layers: Separate embeddings for users and anime (128 dimensions)
Feature Engineering: Element-wise multiplication and concatenation
Deep Network: 4 dense layers (256→128→64→32) with layer normalization
Regularization: L2 regularization, dropout layers
Bias Terms: User and anime bias embeddings
Output: Sigmoid activation for rating prediction (0-1 normalized)

Setup & Installation

Prerequisites

Python 3.11+
Docker Desktop
Google Cloud Platform account
Jenkins (optional, for CI/CD)

Local Development

Clone the repository

git clone https://github.com/Nikelroid/Anime-Recommender-Application.git
cd Anime-Recommender-Application

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -e .

Configure DVC (if pulling existing models)

pip install dvc dvc-gs
dvc remote modify myremote --local credentialpath path/to/gcp-key.json
dvc pull

4.5. Alternative for Data pull

just download data from [kaggle](https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020)
and run data ingestion using this command:
```bash
python src/data_ingestion.py

Train the model

python pipeline/training_pipeline.py

Run the application

python app.py

Access the application at http://localhost:8000

Data Versioning with DVC

Models and large datasets are tracked using DVC with Google Cloud Storage as remote storage.

Add new model version:

dvc add artifacts/models/saved_model
git add artifacts/models/saved_model.dvc .gitignore
git commit -m "Update model version"
dvc push

Pull latest model:

dvc pull

CI/CD Pipeline

Jenkins Setup

Build Jenkins with Docker-in-Docker

Create custom_jenkins/Dockerfile:

FROM jenkins/jenkins:lts
USER root
RUN apt-get update -y && \
    apt-get install -y docker-ce docker-ce-cli containerd.io
RUN groupadd -f docker && usermod -aG docker jenkins
USER jenkins

Build and run:

cd custom_jenkins
docker build -t jenkins-dind .
docker run -d --name jenkins-dind \
  --privileged \
  -p 8080:8080 -p 50000:50000 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v jenkins_home:/var/jenkins_home \
  jenkins-dind

Install Google Cloud SDK & kubectl in Jenkins

docker exec -u root -it jenkins-dind bash
apt-get update
apt-get install -y curl apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
apt-get update && apt-get install -y google-cloud-sdk kubectl google-cloud-sdk-gke-gcloud-auth-plugin
exit
docker restart jenkins-dind

Configure Jenkins Credentials

In Jenkins Dashboard:

Add GitHub credentials (username + token)
Add GCP service account key (Secret file: gcp-key)

Create Pipeline

New Item → Pipeline
Pipeline script from SCM → Git
Point to your repository
Jenkins will use the Jenkinsfile in the repo

Pipeline Stages

Clone Repository: Fetch code from GitHub
Setup Environment: Create virtual environment and install dependencies
DVC Pull: Download model weights from GCS
Build Docker Image: Build for AMD64 platform
Push to GCR: Upload image to Google Container Registry
Verify Image: Confirm image exists in GCR
Create Pull Secret: Setup Kubernetes authentication for GCR
Deploy to GKE: Apply Kubernetes manifests

Deployment to GKE

Prerequisites

Create GKE Cluster

gcloud container clusters create ml-app-cluster \
  --zone us-west2-a \
  --num-nodes 2 \
  --machine-type n1-standard-2

Get Credentials

gcloud container clusters get-credentials ml-app-cluster --zone us-west2-a

Manual Deployment

Build and push Docker image

gcloud auth configure-docker
docker build --platform linux/amd64 -t gcr.io/YOUR-PROJECT-ID/ml-project:latest .
docker push gcr.io/YOUR-PROJECT-ID/ml-project:latest

Create image pull secret

kubectl create secret docker-registry gcr-json-key \
  --docker-server=gcr.io \
  --docker-username=_json_key \
  --docker-password="$(cat path/to/gcp-key.json)" \
  --docker-email=your-email@example.com

Deploy application

kubectl apply -f deployment.yaml

Get external IP

kubectl get services ml-app-service

Wait for the EXTERNAL-IP to be assigned, then access your application.

Crucial Note

If you want to use Jenkins with Docker in Docker architecture, it is so important to use Jenkinsfile in the root directory and fill in the value YOUR_VENV_DIR, YOUR_GCP_PROJECT_ID, YOUR_GCLOUD_PATH, and YOUR_KUBERNETES_AUTHin Jenkinsfile header in config part, as well as YOUR_KUBERNETER_CLUSTER_NAME and YOUR_REGION, which are related to your GKE, and they are located in the Kubernetes deployment stage of the file. Also, fill YOUR_GCP_PROJECT_ID value in deployment.yaml file, which is related to the Kubernetes configuration

Kubernetes Configuration

The deployment.yaml defines:

Deployment: 1 replica with resource limits
Service: LoadBalancer exposing port 80
Image Pull Secret: For GCR authentication

Configuration

Edit config/config.yaml to modify: (Don't forget to create a bucket in GCP and fill YOUR_BUCKET_NAME with your bucket name.

data_ingestion:
  bucket_name: "YOUR_BUCKET_NAME"
  
model_training:
  batch_size: 512
  epochs: 50
  force_training: false  # Skip training if model exists
  checkpoint_dir: "artifacts/models"
  checkpoint_file_name: "saved_model"

Monitoring

The project uses Comet ML for experiment tracking:

Training/validation loss curves
Model architecture visualization
Hyperparameter logging
Metric tracking per epoch

Set your Comet API key in src/model_training.py; you can easily sign up in Comet-ml, Then fill YOURAPI_KEY, YOUR_PROJECT_NAME, and YOUR_WORKSPACE with yours. It will provide a universal profile for your model monitoring.

Clean, minimalist interface for your model and data monitoring

API Endpoints

GET / - Home page
POST /predict - Get anime recommendations for a user

Example request:

{
  user_ratings : {
    'Attack on Titan': 9,
    'Death Note': 8,
    'One Piece': 1,
    'Naruto': 6,
    'Monster': 2
}

Troubleshooting

Model not loading in Docker:

Ensure DVC pull completes before Docker build
Verify model files exist in artifacts/models/saved_model
Check SavedModel format compatibility (TensorFlow versions)

ImagePullBackOff in Kubernetes:

Verify image exists: gcloud container images list
Check image architecture: --platform linux/amd64
Ensure the image pull secret is created and referenced in deployment

Training takes too long:

Train locally with GPU
Push trained model to DVC
Set force_training: false in config.yaml

Not satisfied with recommendations:

Change model structure and methods
Change Hyperparameters, increase epochs, or change loss and activation functions in config.yaml

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License.

Contact

For questions or issues, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.dvc		.dvc
artifacts		artifacts
config		config
jenkins_project		jenkins_project
logs		logs
notebook		notebook
pipeline		pipeline
src		src
static		static
utils		utils
.DS_Store		.DS_Store
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
README.md		README.md
app.py		app.py
cometml.png		cometml.png
deployment.yaml		deployment.yaml
requirements.txt		requirements.txt
setup.py		setup.py
ui1.png		ui1.png
ui2.png		ui2.png

Nikelroid/anime-recommender-application

Folders and files

Latest commit

History

Repository files navigation

Anime Recommender System: An end-to-end MLOps application

Table of Contents

Overview

Key Features

User Interface & Usage Guide

Overview

How to Use

1. Access the Application

2. Get Recommendations

Screenshots

Main Interface

Recommendation Results

API Usage (Programmatic Access)

Endpoint

Request Format

Response Format

Error Response

Features

Anime List Information

Troubleshooting

Privacy & Data

Architecture

Tech Stack

Project Structure

Model Architecture

Setup & Installation

Prerequisites

Local Development

Data Versioning with DVC

CI/CD Pipeline

Jenkins Setup

Pipeline Stages

Deployment to GKE

Prerequisites

Manual Deployment

Crucial Note

Kubernetes Configuration

Configuration

Monitoring

API Endpoints

Troubleshooting

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages