Draft
Conversation
Implements in-memory caching with TTL/LRU eviction and two-tier rate limiting (global + per-IP) to prevent upstream API throttling and service abuse. Core Implementation: - InMemoryCache class using cachetools.TTLCache - LRU eviction when maxsize exceeded - TTL-based expiration (default: 3600s) - Stale cache fallback for graceful degradation - Thread-safe with threading.RLock - RateLimiter class with sliding window algorithm - Global: 30 req/min across all clients - Per-IP: 10 req/min per client (HTTP mode only) - Returns 429 only if rate limited AND no cache available - Serves stale cache when rate limited Client Integration: - Applied @with_rate_limit decorator to all API methods - GitHubClient: 6 methods - TerraformClient: 5 methods - Updated cache initialization to use InMemoryCache Server Integration: - Global rate limiter initialized on startup - Rate limiters injected into both clients - PerIPRateLimitMiddleware for HTTP mode - Extracts client IP from headers (X-Forwarded-For, X-Real-IP) - Adds rate limit headers to responses - Bypasses /health endpoint Configuration: - Added TIM_CACHE_MAXSIZE (default: 1000) - Added TIM_GLOBAL_RATE_LIMIT (default: 30) - Added TIM_PER_IP_RATE_LIMIT (default: 10) - Added TIM_RATE_LIMIT_WINDOW (default: 60) - BREAKING: Removed TIM_CACHE_DIR (file-based cache deprecated) Testing: - 25 unit tests (10 for cache, 15 for rate limiter) - All tests passing - Validated stale cache fallback - Verified rate limiting behavior Resolves #59 Related to epic https://github.ibm.com/GoldenEye/issues/issues/17013
- Add new environment variables to README (TIM_CACHE_MAXSIZE, TIM_GLOBAL_RATE_LIMIT, TIM_PER_IP_RATE_LIMIT, TIM_RATE_LIMIT_WINDOW) - Document removal of TIM_CACHE_DIR (breaking change) - Update Code Engine deployment guide with rate limiting configuration - Update deployment command examples with new env vars - Revise Next Steps section to reflect implemented features
- Change mcp.app.add_middleware to mcp.add_middleware - FastMCP has add_middleware method directly on the object - Fixes AttributeError during HTTP mode initialization
Replace custom caching and rate limiting code with established libraries to reduce maintenance burden and improve reliability. Changes: - Use 'limits' library (same as slowapi) for rate limiting - Simplify InMemoryCache to thin wrapper around cachetools - Add slowapi and limits dependencies with pinned versions - Reduce code from 599 to 221 lines (63% reduction) - Maintain all functionality including stale cache fallback - All 25 unit tests passing Dependencies added: - limits==5.6.0 (battle-tested rate limiting) - slowapi==0.1.9 (FastAPI rate limiting framework) - cachetools==5.5.0 (updated to exact version) Benefits: - Less custom code to maintain - Battle-tested algorithms and implementations - Improved reliability with well-known libraries - No breaking changes to existing API
Reorganize environment variables into two clear sections to reduce friction for new users: - Basic Configuration: Only GITHUB_TOKEN and TIM_LOG_LEVEL - Advanced Configuration: Production/hosting settings This makes it clear that simple stdio users only need to worry about the GitHub token, while production deployments can tune caching and rate limiting separately.
Fix middleware registration to use FastMCP's http_app() method which returns the underlying Starlette app. This allows proper registration of Starlette middleware. The add_middleware() method on FastMCP itself expects a different middleware format, so we need to get the Starlette app first and add our BaseHTTPMiddleware to that.
Add detailed docstring explaining why this module exists and why we can't use an existing library: - cachetools lacks stale cache support - No actively maintained alternatives for general-purpose caching - expirecache (unmaintained since 2015) - requests-cache (HTTP-specific, not general-purpose) Clarifies that 90% of functionality comes from cachetools and only the stale cache feature (10%) is custom code, making this a minimal wrapper justified by lack of alternatives.
Replace bullet-point justification with concise paragraph explaining what this module provides beyond cachetools: stale cache support for serving expired entries during rate limiting.
Add concise paragraph explaining what this module provides beyond the limits library: decorator integration with stale cache fallback for graceful degradation when rate limits are exceeded.
Change requires-python from >=3.11 to ==3.12.* to match PR #58 (IBM Code Engine deployment) which uses UBI8 Python 3.12. Also update ruff target-version to py312 for consistency. All 25 unit tests continue to pass with Python 3.12.
Update lock file to reflect requires-python = ==3.12.* constraint. Changes: - Remove async-timeout (not needed in Python 3.12+) - Remove backports-tarfile (not needed in Python 3.12+) - Remove Python 3.11/3.13/3.14 wheel URLs - Simplify version markers
- Remove slowapi from dependencies (only limits library is used) - Add floor of 1 to Retry-After header to prevent negative values - Update rate_limiter docstring to remove slowapi reference
Replace unbounded dict with TTLCache for stale cache to prevent unbounded memory growth. Stale cache uses 24x longer TTL and 2x larger maxsize than primary cache for graceful degradation.
- Fix critical bug: middleware now calls record_request() to actually enforce per-IP rate limiting - Fix test assertion: stale_size key matches implementation - Add 12 unit tests for PerIPRateLimitMiddleware covering: - Rate limit enforcement and bypass paths - IP extraction from X-Forwarded-For and X-Real-IP headers - Rate limit headers in responses - Per-IP isolation - Document IP header trust assumption for production deployments
- Add atomic try_acquire() method to RateLimiter to prevent race conditions - Add missing @Retry decorator to get_repository_tree() - Centralize cache key generation with helper functions to avoid duplication - Use underscore for unused lambda parameters to fix linter warnings - Reduce test sleep from 11s to 1.1s for faster test execution
- Fix decorator order: @with_rate_limit now wraps @Retry to prevent retrying rate-limited requests - Remove duplicate cache key generation by having the decorator handle both fresh cache lookup and caching results - Add @Retry decorator to resolve_version for consistency - Refactor from global rate limiter to dependency injection via shared context module, eliminating module-level state - Add configurable stale cache TTL and size multipliers - Add /stats endpoint in HTTP mode for observability - Add logging for stale cache fallback exceptions - Update tests to use freezegun and fix test assertions
- Replace dual-cache design with single TTLCache + timestamp tracking - Extract common decorator and cache key logic into clients/base.py - Reduce GitHubClient from 822 to 294 lines (64% reduction) - Reduce TerraformClient from 583 to 208 lines (64% reduction) - Remove stale cache multiplier config options (use sensible defaults) - Update tests to match simplified cache stats structure
- Include kwarg keys in cache key generation to prevent collisions - Fix timestamp memory leak by cleaning up orphaned entries on cache miss - Make rate_limit_key configurable in with_rate_limit decorator - Extract shared check_rate_limit_response helper to reduce client duplication
- Use Starlette Middleware class with http_app() middleware parameter - Run uvicorn directly instead of mcp.run() to use configured app - Fixes middleware not being applied to MCP endpoints
- Track hits, last_accessed per cache key - Add hit_rate to /stats summary - Add /stats/cache endpoint with per-key details (?top=N parameter)
- Fix line length violations in client and cache modules - Remove unused BaseAPIClient class from base.py - Remove unnecessary try/except blocks in cache.py - Add documentation for stale_ttl_multiplier magic number - Use datetime.UTC alias instead of timezone.utc
Add optional Redis as a persistent L2 cache layer with a daily sync job to pre-warm the cache. Architecture: - L1 (memory): 5 min TTL for fast local reads - L2 (Redis): 48h TTL for persistence across restarts New files: - tim_mcp/utils/redis_cache.py: Async Redis cache backend - tim_mcp/utils/tiered_cache.py: L1+L2 cache wrapper - scripts/sync_cache.py: Daily sync script - .github/workflows/cache-sync.yml: 3 AM UTC daily workflow Configuration: - TIM_REDIS_ENABLED: Enable Redis (default: false) - TIM_REDIS_URL: Connection URL (supports rediss:// for TLS) - TIM_L1_CACHE_TTL: Memory cache TTL when Redis enabled
added 6 commits
January 27, 2026 23:45
- Replace single TTLCache + timestamp tracking with two TTLCache instances - Remove unused stats methods from cache, rate limiter, and middleware - Remove X-RateLimit-* headers from middleware responses - Rename cache_ttl to cache_fresh_ttl, add cache_evict_ttl config - Remove /stats endpoints from HTTP mode - Update tests to match simplified API - Collapse advanced config in README - Update hypothesis to 6.151.0
Integrate simplified caching/rate limiting from PR #60: - Use fresh_ttl/evict_ttl cache model - Remove stats endpoints (per review feedback) - Keep Redis L2 lifecycle management
- Switch to aiocache for Redis backend (simpler, battle-tested) - Wire up AsyncTieredCache properly so L2 actually works - Remove dead code: unused TieredCache class, get_stats, get_sync/set_sync - Remove unused redis_cache from context module - Parallelize sync script with asyncio.gather (5x faster) - Add rate limiting throttle to sync script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds optional Redis as a persistent L2 cache layer to reduce API calls and improve response times.
Architecture:
Changes
aiocachewith fresh/stale TTL supportAsyncTieredCachewrapper for L1+L2 flowasyncio.gather)AsyncTieredCachewhen Redis is enabledTIM_REDIS_ENABLED,TIM_REDIS_URL,TIM_L1_CACHE_TTLConfiguration
TIM_REDIS_ENABLEDfalseTIM_REDIS_URLredis://localhost:6379rediss://for IBM Cloud)TIM_L1_CACHE_TTL300Dependencies
rediswithaiocache[redis]for simpler connection pooling and serializationTest plan
uv run pytestdocker run -d -p 6379:6379 redis:7-alpinepython scripts/sync_cache.py