An intelligent AI routing API that automatically selects the most cost-effective language model based on your request's intent and complexity. Save up to 85-90% on AI API costs while maintaining quality.
- Massive Cost Savings: Automatically route requests to ultra-cheap models like DeepSeek ($0.14/1M tokens)
- 9 AI Models: Access OpenAI, Anthropic, DeepSeek, and Qwen models through one unified API
- Intelligent Routing: Intent detection (coding, math, creative, etc.) + complexity analysis
- Response Caching: 1-hour cache eliminates redundant API calls
- Quality Scoring: LLM-as-Judge evaluation ensures high-quality responses
- Real-time Streaming: Get instant token-by-token responses
- Subscription Tiers: From FREE (100 req/month) to ENTERPRISE (50K req/month)
| Tier | Monthly Requests | Premium Requests | Rate Limit | Price |
|---|---|---|---|---|
| FREE | 100 | 0 | 50/hour | $0 |
| BASIC | 2,000 | 0 | 200/hour | $4.99 |
| PRO | 10,000 | 1,000 | 1,000/hour | $14.99 |
| ENTERPRISE | 50,000 | 10,000 | 5,000/hour | $49.99 |
| Model | Provider | Input Cost | Output Cost | Best For |
|---|---|---|---|---|
deepseek-chat |
DeepSeek | $0.14/1M | $0.28/1M | Ultra-cheap general tasks |
deepseek-coder |
DeepSeek | $0.14/1M | $0.28/1M | Code generation |
qwen-turbo |
Qwen | $0.20/1M | $0.60/1M | Fast, cheap responses |
qwen-plus |
Qwen | $0.40/1M | $1.20/1M | Balanced quality/cost |
qwen-max |
Qwen | $2.00/1M | $6.00/1M | High-quality multilingual |
| Model | Provider | Input Cost | Output Cost | Best For |
|---|---|---|---|---|
gpt-4o-mini |
OpenAI | $0.15/1M | $0.60/1M | Simple OpenAI tasks |
gpt-3.5-turbo |
OpenAI | $0.50/1M | $1.50/1M | Standard ChatGPT |
claude-3-5-haiku-20241022 |
Anthropic | $0.25/1M | $1.25/1M | Fast Claude responses |
claude-sonnet-4-20250514 |
Anthropic | $3.00/1M | $15.00/1M | High-quality reasoning |
git clone https://github.com/LukasTidenJ/LLM-COST-ROUTER-API.git
cd LLM-COST-ROUTER-API
npm installCreate a .env file:
PORT=8000
ROUTER_API_KEY=your-secret-key-here
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
DEEPSEEK_API_KEY=sk-...
QWEN_API_KEY=sk-...npm startThe API will be available at http://localhost:8000
curl -X POST http://localhost:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-d '{
"messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
"stream": false
}'curl -X POST http://localhost:8000/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "x-api-key: your-api-key" \
-d '{
"messages": [{"role": "user", "content": "Write a short story about AI"}],
"stream": true
}'Main endpoint for chat completions.
Request Body:
{
"messages": [{ "role": "user", "content": "Your question here" }],
"max_tokens": 4000,
"temperature": 0.7,
"stream": false
}Response:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "AI generated response..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 150,
"total_tokens": 170
},
"cost": {
"input_cost": 0.000002,
"output_cost": 0.00003,
"total_cost": 0.000032
},
"metadata": {
"intent": "general",
"complexity": "medium",
"selected_model": "deepseek-chat",
"cached": false,
"quality_score": 8.5
}
}Check API health and your subscription status.
{
"status": "ok",
"tier": "PRO",
"quota_remaining": 9847,
"premium_remaining": 998,
"rate_limit": "1000/hour"
}View cache hit/miss statistics.
Get all available models and their pricing.
The API analyzes your prompt to determine the task type:
- Coding → Routes to
deepseek-coder(specialized for code) - Math/Logic → Routes to
deepseek-chat(excellent reasoning) - Creative Writing → Routes to Claude models (best for creativity)
- Analysis/Reasoning → Routes to Claude models (superior analysis)
- Translation → Routes to
qwen-plusorqwen-max(multilingual) - Summarization → Routes to cost-effective models
- General → Routes to most cost-effective model
- Low (<500 tokens) → Cheapest models (DeepSeek, Qwen-Turbo)
- Medium (500-2000 tokens) → Balanced models (Qwen-Plus, GPT-4o-mini)
- High (>2000 tokens) → Premium models if quota available (Claude, Qwen-Max)
- All responses evaluated by LLM-as-Judge
- Low-quality responses (<7/10) trigger automatic retry with better model
- Ensures you get quality even when using cheap models
- 10,000 requests/month
- Average 500 tokens per request
- Cost: ~$150-200/month
- Same 10,000 requests/month
- 60% routed to DeepSeek: $4.20
- 25% routed to Qwen: $5.00
- 15% routed to GPT/Claude: $18.00
- Total: ~$27/month
Savings: 85-90%
- 1-hour TTL for identical requests
- Eliminates redundant API calls
- Dramatically reduces costs for repeated queries
- LLM-as-Judge evaluation (1-10 scale)
- Automatic retry with better model if quality < 7
- Ensures high-quality responses even from cheap models
- Real-time token-by-token responses
- Better user experience
- Works with all models
- Tier-based limits prevent abuse
- 15-minute rolling windows
- Automatic 429 responses when exceeded
- Create account at Render.com
- Create new Web Service, connect your GitHub repo
- Set environment variables in Render dashboard
- Deploy!
This API is available on RapidAPI Marketplace:
- Marketplace URL: [Coming Soon]
- Category: AI/Machine Learning
- Highlight: Save up to 90% on AI API costs
| Variable | Description | Required |
|---|---|---|
PORT |
Server port (default: 8000) | No |
ROUTER_API_KEY |
Your API authentication key | Yes |
OPENAI_API_KEY |
OpenAI API key | Yes |
ANTHROPIC_API_KEY |
Anthropic API key | Yes |
DEEPSEEK_API_KEY |
DeepSeek API key | Yes |
QWEN_API_KEY |
Qwen API key | Yes |
- Chatbots: Reduce AI costs for customer support bots by 85%+
- Content Generation: Smart routing for different content types
- Code Assistance: Automatic selection of specialized coding models
- Research: Cost-effective analysis and summarization
- Education: Affordable AI tutoring and learning assistance
- Data Analysis: Handle large datasets with cost-optimized models
- API key authentication on every request
- Rate limiting prevents abuse
- Environment variables keep secrets secure
- No data stored or logged permanently
- All requests are ephemeral
Check your usage with the /status endpoint:
curl -H "x-api-key: your-key" http://localhost:8000/api/v1/statusResponse includes:
- Current subscription tier
- Request quota remaining
- Premium request quota
- Rate limit status
- Monthly reset date
- FREE: Community support via GitHub Issues
- BASIC: Email support (24-48h response)
- PRO: Priority email support (12-24h response)
- ENTERPRISE: Dedicated support channel
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
MIT License - feel free to use this project for personal or commercial purposes.
- GitHub: https://github.com/LukasTidenJ/LLM-COST-ROUTER-API
- RapidAPI: [Coming Soon]
- Issues: https://github.com/LukasTidenJ/LLM-COST-ROUTER-API/issues
Made possible by: OpenAI, Anthropic, DeepSeek, and Qwen