Skip to content

feat: add Redis L2 cache with daily sync#66

Draft
jor2 wants to merge 37 commits intomainfrom
feat/redis-l2-cache
Draft

feat: add Redis L2 cache with daily sync#66
jor2 wants to merge 37 commits intomainfrom
feat/redis-l2-cache

Conversation

@jor2
Copy link
Copy Markdown
Member

@jor2 jor2 commented Jan 27, 2026

Summary

Adds optional Redis as a persistent L2 cache layer to reduce API calls and improve response times.

Architecture:

Request → L1 Memory (5min) → L2 Redis (48h) → API
  • L1 hit: instant return
  • L1 miss, L2 hit: populate L1, return
  • Both miss: fetch from API, populate both

Changes

  • redis_cache.py: Async Redis backend using aiocache with fresh/stale TTL support
  • tiered_cache.py: Minimal AsyncTieredCache wrapper for L1+L2 flow
  • sync_cache.py: Daily sync script with parallel fetching (asyncio.gather)
  • cache-sync.yml: GitHub Actions workflow (3 AM UTC daily)
  • server.py: Properly wires AsyncTieredCache when Redis is enabled
  • Config options: TIM_REDIS_ENABLED, TIM_REDIS_URL, TIM_L1_CACHE_TTL

Configuration

Variable Default Description
TIM_REDIS_ENABLED false Enable Redis L2 cache
TIM_REDIS_URL redis://localhost:6379 Connection URL (supports rediss:// for IBM Cloud)
TIM_L1_CACHE_TTL 300 L1 memory cache TTL (5 min)

Dependencies

  • Replaced redis with aiocache[redis] for simpler connection pooling and serialization

Test plan

  • Run tests: uv run pytest
  • Test with local Redis: docker run -d -p 6379:6379 redis:7-alpine
  • Verify sync script: python scripts/sync_cache.py

Jordan-Williams2 and others added 29 commits January 13, 2026 00:16
Implements in-memory caching with TTL/LRU eviction and two-tier rate
limiting (global + per-IP) to prevent upstream API throttling and
service abuse.

Core Implementation:
- InMemoryCache class using cachetools.TTLCache
  - LRU eviction when maxsize exceeded
  - TTL-based expiration (default: 3600s)
  - Stale cache fallback for graceful degradation
  - Thread-safe with threading.RLock

- RateLimiter class with sliding window algorithm
  - Global: 30 req/min across all clients
  - Per-IP: 10 req/min per client (HTTP mode only)
  - Returns 429 only if rate limited AND no cache available
  - Serves stale cache when rate limited

Client Integration:
- Applied @with_rate_limit decorator to all API methods
  - GitHubClient: 6 methods
  - TerraformClient: 5 methods
- Updated cache initialization to use InMemoryCache

Server Integration:
- Global rate limiter initialized on startup
- Rate limiters injected into both clients
- PerIPRateLimitMiddleware for HTTP mode
  - Extracts client IP from headers (X-Forwarded-For, X-Real-IP)
  - Adds rate limit headers to responses
  - Bypasses /health endpoint

Configuration:
- Added TIM_CACHE_MAXSIZE (default: 1000)
- Added TIM_GLOBAL_RATE_LIMIT (default: 30)
- Added TIM_PER_IP_RATE_LIMIT (default: 10)
- Added TIM_RATE_LIMIT_WINDOW (default: 60)
- BREAKING: Removed TIM_CACHE_DIR (file-based cache deprecated)

Testing:
- 25 unit tests (10 for cache, 15 for rate limiter)
- All tests passing
- Validated stale cache fallback
- Verified rate limiting behavior

Resolves #59
Related to epic https://github.ibm.com/GoldenEye/issues/issues/17013
- Add new environment variables to README (TIM_CACHE_MAXSIZE, TIM_GLOBAL_RATE_LIMIT, TIM_PER_IP_RATE_LIMIT, TIM_RATE_LIMIT_WINDOW)
- Document removal of TIM_CACHE_DIR (breaking change)
- Update Code Engine deployment guide with rate limiting configuration
- Update deployment command examples with new env vars
- Revise Next Steps section to reflect implemented features
- Change mcp.app.add_middleware to mcp.add_middleware
- FastMCP has add_middleware method directly on the object
- Fixes AttributeError during HTTP mode initialization
Replace custom caching and rate limiting code with established libraries
to reduce maintenance burden and improve reliability.

Changes:
- Use 'limits' library (same as slowapi) for rate limiting
- Simplify InMemoryCache to thin wrapper around cachetools
- Add slowapi and limits dependencies with pinned versions
- Reduce code from 599 to 221 lines (63% reduction)
- Maintain all functionality including stale cache fallback
- All 25 unit tests passing

Dependencies added:
- limits==5.6.0 (battle-tested rate limiting)
- slowapi==0.1.9 (FastAPI rate limiting framework)
- cachetools==5.5.0 (updated to exact version)

Benefits:
- Less custom code to maintain
- Battle-tested algorithms and implementations
- Improved reliability with well-known libraries
- No breaking changes to existing API
Reorganize environment variables into two clear sections to reduce
friction for new users:

- Basic Configuration: Only GITHUB_TOKEN and TIM_LOG_LEVEL
- Advanced Configuration: Production/hosting settings

This makes it clear that simple stdio users only need to worry about
the GitHub token, while production deployments can tune caching and
rate limiting separately.
Fix middleware registration to use FastMCP's http_app() method
which returns the underlying Starlette app. This allows proper
registration of Starlette middleware.

The add_middleware() method on FastMCP itself expects a different
middleware format, so we need to get the Starlette app first and
add our BaseHTTPMiddleware to that.
Add detailed docstring explaining why this module exists and why we
can't use an existing library:

- cachetools lacks stale cache support
- No actively maintained alternatives for general-purpose caching
- expirecache (unmaintained since 2015)
- requests-cache (HTTP-specific, not general-purpose)

Clarifies that 90% of functionality comes from cachetools and only
the stale cache feature (10%) is custom code, making this a minimal
wrapper justified by lack of alternatives.
Replace bullet-point justification with concise paragraph explaining
what this module provides beyond cachetools: stale cache support for
serving expired entries during rate limiting.
Add concise paragraph explaining what this module provides beyond
the limits library: decorator integration with stale cache fallback
for graceful degradation when rate limits are exceeded.
Change requires-python from >=3.11 to ==3.12.* to match PR #58
(IBM Code Engine deployment) which uses UBI8 Python 3.12.

Also update ruff target-version to py312 for consistency.

All 25 unit tests continue to pass with Python 3.12.
Update lock file to reflect requires-python = ==3.12.* constraint.

Changes:
- Remove async-timeout (not needed in Python 3.12+)
- Remove backports-tarfile (not needed in Python 3.12+)
- Remove Python 3.11/3.13/3.14 wheel URLs
- Simplify version markers
- Remove slowapi from dependencies (only limits library is used)
- Add floor of 1 to Retry-After header to prevent negative values
- Update rate_limiter docstring to remove slowapi reference
Replace unbounded dict with TTLCache for stale cache to prevent
unbounded memory growth. Stale cache uses 24x longer TTL and 2x
larger maxsize than primary cache for graceful degradation.
- Fix critical bug: middleware now calls record_request() to actually
  enforce per-IP rate limiting
- Fix test assertion: stale_size key matches implementation
- Add 12 unit tests for PerIPRateLimitMiddleware covering:
  - Rate limit enforcement and bypass paths
  - IP extraction from X-Forwarded-For and X-Real-IP headers
  - Rate limit headers in responses
  - Per-IP isolation
- Document IP header trust assumption for production deployments
- Add atomic try_acquire() method to RateLimiter to prevent race conditions
- Add missing @Retry decorator to get_repository_tree()
- Centralize cache key generation with helper functions to avoid duplication
- Use underscore for unused lambda parameters to fix linter warnings
- Reduce test sleep from 11s to 1.1s for faster test execution
- Fix decorator order: @with_rate_limit now wraps @Retry to prevent
  retrying rate-limited requests
- Remove duplicate cache key generation by having the decorator handle
  both fresh cache lookup and caching results
- Add @Retry decorator to resolve_version for consistency
- Refactor from global rate limiter to dependency injection via shared
  context module, eliminating module-level state
- Add configurable stale cache TTL and size multipliers
- Add /stats endpoint in HTTP mode for observability
- Add logging for stale cache fallback exceptions
- Update tests to use freezegun and fix test assertions
- Replace dual-cache design with single TTLCache + timestamp tracking
- Extract common decorator and cache key logic into clients/base.py
- Reduce GitHubClient from 822 to 294 lines (64% reduction)
- Reduce TerraformClient from 583 to 208 lines (64% reduction)
- Remove stale cache multiplier config options (use sensible defaults)
- Update tests to match simplified cache stats structure
- Include kwarg keys in cache key generation to prevent collisions
- Fix timestamp memory leak by cleaning up orphaned entries on cache miss
- Make rate_limit_key configurable in with_rate_limit decorator
- Extract shared check_rate_limit_response helper to reduce client duplication
- Use Starlette Middleware class with http_app() middleware parameter
- Run uvicorn directly instead of mcp.run() to use configured app
- Fixes middleware not being applied to MCP endpoints
- Track hits, last_accessed per cache key
- Add hit_rate to /stats summary
- Add /stats/cache endpoint with per-key details (?top=N parameter)
- Fix line length violations in client and cache modules
- Remove unused BaseAPIClient class from base.py
- Remove unnecessary try/except blocks in cache.py
- Add documentation for stale_ttl_multiplier magic number
- Use datetime.UTC alias instead of timezone.utc
Add optional Redis as a persistent L2 cache layer with a daily sync job
to pre-warm the cache.

Architecture:
- L1 (memory): 5 min TTL for fast local reads
- L2 (Redis): 48h TTL for persistence across restarts

New files:
- tim_mcp/utils/redis_cache.py: Async Redis cache backend
- tim_mcp/utils/tiered_cache.py: L1+L2 cache wrapper
- scripts/sync_cache.py: Daily sync script
- .github/workflows/cache-sync.yml: 3 AM UTC daily workflow

Configuration:
- TIM_REDIS_ENABLED: Enable Redis (default: false)
- TIM_REDIS_URL: Connection URL (supports rediss:// for TLS)
- TIM_L1_CACHE_TTL: Memory cache TTL when Redis enabled
@jor2 jor2 requested a review from vburckhardt as a code owner January 27, 2026 12:08
@jor2 jor2 self-assigned this Jan 27, 2026
Jordan-Williams2 added 6 commits January 27, 2026 23:45
- Replace single TTLCache + timestamp tracking with two TTLCache instances
- Remove unused stats methods from cache, rate limiter, and middleware
- Remove X-RateLimit-* headers from middleware responses
- Rename cache_ttl to cache_fresh_ttl, add cache_evict_ttl config
- Remove /stats endpoints from HTTP mode
- Update tests to match simplified API
- Collapse advanced config in README
- Update hypothesis to 6.151.0
Integrate simplified caching/rate limiting from PR #60:
- Use fresh_ttl/evict_ttl cache model
- Remove stats endpoints (per review feedback)
- Keep Redis L2 lifecycle management
- Switch to aiocache for Redis backend (simpler, battle-tested)
- Wire up AsyncTieredCache properly so L2 actually works
- Remove dead code: unused TieredCache class, get_stats, get_sync/set_sync
- Remove unused redis_cache from context module
- Parallelize sync script with asyncio.gather (5x faster)
- Add rate limiting throttle to sync script
@jor2 jor2 marked this pull request as draft January 28, 2026 10:50
Base automatically changed from feat/caching-rate-limiting to main February 5, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant