Skip to content

srividya-0001/distributed-ml-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

**# distributed-ml-inference Learning-focused project to build a distributed ML inference system step by step.

Distributed ML Inference System

This project is built step by step to learn cloud and distributed systems. The goal is to understand how machine learning models are deployed, scaled, and monitored in real-world environments.**# Design and Implementation of a Distributed Machine Learning Inference System on Cloud


Project Description

This project focuses on building a backend web service that uses machine learning and is designed to handle a large number of users at the same time without crashing.

In many real-world situations, when too many users access a service simultaneously, the server becomes overloaded and returns errors. This project demonstrates how to design a system that avoids such failures by distributing the workload across multiple services.

The main goal is to understand how real applications are built and deployed, not just how machine learning models are trained.


Why This Project Is Needed (Real-Life Motivation)

In real life, we often face situations like:

  • Semester results being released
  • Railway or bus ticket booking
  • Online exam submission
  • Banking or payment systems during peak hours

When thousands of users try to access a service at the same time, a single server cannot handle the load and shows errors like “Server Error” or “Service Unavailable”.

This project is needed to solve that exact problem.


Simple Real-World Example

When semester results are released:

  • Thousands of students try to check results at once
  • The website crashes due to high traffic
  • Students see server errors for hours

This happens because the system depends on only one server.

This project shows how to design a system so that:

  • Many users can access it simultaneously
  • The service continues working
  • Server crashes are avoided

Existing Cloud Services and Problems Faced

Existing Cloud Services

Popular cloud platforms include:

  • Amazon Web Services (AWS)
  • Google Cloud Platform (GCP)
  • Microsoft Azure

They provide:

  • Servers
  • Storage
  • Load balancers
  • Scaling mechanisms

Problems Faced in Real Systems

Even with cloud services, applications face issues such as:

  • Single point of failure
  • Server overload during peak traffic
  • Slow response time
  • Application crashes
  • Poor reliability

These problems occur due to poor system design, not because cloud services are weak.


Need for Developing This Project

Most student projects focus only on:

  • Training ML models
  • Showing accuracy results

But in the real world:

  • The main challenge is serving the model to users reliably

This project is developed to understand:

  • How ML models are used in real applications
  • How systems handle many users
  • How failures are avoided
  • How applications scale during high traffic

Abstract

This project presents the design and implementation of a cloud-based distributed machine learning inference system. The system is designed to handle multiple user requests simultaneously by distributing incoming requests across multiple service instances. This approach improves reliability, scalability, and availability of the application. The project demonstrates practical system design concepts used in real-world applications to prevent server crashes during peak usage.


Objectives

The objectives of this project are:

  1. To build a backend web service using Python
  2. To integrate a machine learning model into the service
  3. To handle multiple user requests simultaneously
  4. To avoid server crashes during high traffic
  5. To understand real-world cloud deployment concepts
  6. To design a reliable and scalable system

Expected Outcome

At the end of this project:

  • A working backend web service will be available
  • Users can send requests and receive ML-based responses
  • The system will handle many users at the same time
  • The service will not fail even under heavy load
  • The project will demonstrate real-world system design skills

How End Users Use the Project

Type of Application

  • Backend web application (Web Service)

How Users Interact

  • Users send data through:
    • API requests
    • A simple web interface (optional)

Usage Flow

  1. User sends input data to the web service
  2. The request is handled by one of the available service instances
  3. The machine learning model processes the input
  4. The prediction result is returned to the user

Users do not need to know how many servers are running in the background.


Complete Project Roadmap (Start to End)

Phase 1: Basic Setup

  • Create GitHub repository
  • Add README documentation
  • Add basic Python file
  • Understand project structure

Phase 2: Simple Web Service

  • Convert Python program into a web service
  • Handle basic user requests
  • Test using browser or API tools

Phase 3: Machine Learning Integration

  • Train a simple ML model
  • Load the model into the web service
  • Return predictions to users

Phase 4: Making the Service Cloud-Ready

  • Package the application properly
  • Ensure consistent execution across environments

Phase 5: Distributed Design

  • Run multiple copies of the service
  • Distribute requests among services
  • Handle multiple users simultaneously

Phase 6: Reliability and Testing

  • Simulate high traffic
  • Ensure the service does not crash
  • Improve stability

Phase 7: Documentation and Finalization

  • Update README
  • Explain system behavior
  • Document limitations and future improvements

Conclusion

This project demonstrates how real-world applications are designed to handle high user traffic without server failures. By combining machine learning with proper system design, the project shows how reliable and scalable web services are built in practice.

About

Learning-focused project to build a distributed ML inference system step by step.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages