Bug Summary
The MCP tools caching mechanism in ToolCallAgent has a race condition where multiple concurrent requests can cause cache corruption and memory leaks.
Severity: MEDIUM-HIGH
Bug Details
Location
- File:
/spoon_ai/agents/toolcall.py
- Lines: 42-86 (MCP tools caching implementation)
- Method:
_get_cached_mcp_tools()
Issue Description
The current caching implementation has several critical issues:
async def _get_cached_mcp_tools(self) -> List[MCPTool]:
"""Get MCP tools with caching to avoid repeated server calls."""
current_time = time.time()
# Thread-safe cache check
async with asyncio.Lock() if not hasattr(self, '_cache_lock') else asyncio.Lock(): # BUG: Creates new lock each time
if not hasattr(self, '_cache_lock'):
self._cache_lock = asyncio.Lock() # BUG: Race condition here
Specific Issues Identified
-
Lock Creation Race Condition:
- Multiple coroutines can pass the
hasattr check simultaneously
- Each creates a new lock, defeating the purpose of synchronization
- Can lead to cache corruption when multiple agents access simultaneously
-
Memory Leak in Cache Management:
- Cache lock objects are created but never properly cleaned up
- Agent instances accumulate locks over time
- No mechanism to clean up expired cache locks
-
Inefficient Lock Usage:
- Creates new
asyncio.Lock() in the conditional statement
- Should use the existing
_cache_lock if available
- Current code always creates a temporary lock
Impact
- Concurrency Issues: Multiple agents fetching tools simultaneously can corrupt cache
- Memory Leaks: Accumulation of lock objects over time
- Performance: Inefficient lock creation and cache misses
- Reliability: Potential deadlocks in high-concurrency scenarios
Steps to Reproduce
- Create multiple
ToolCallAgent instances
- Call
_get_cached_mcp_tools() concurrently from multiple coroutines
- Observe race conditions in cache lock creation
- Monitor memory usage over time to see lock accumulation
Expected Behavior
- Thread-safe cache access with proper lock management
- No memory leaks from accumulated lock objects
- Consistent cache behavior under high concurrency
- Proper cache invalidation and cleanup
Current Behavior
- Race conditions during lock initialization
- Memory leaks from improper lock cleanup
- Potential cache corruption under concurrent access
- Inconsistent cache behavior
Fix Required
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._cache_lock = asyncio.Lock() # Initialize lock in constructor
async def _get_cached_mcp_tools(self) -> List[MCPTool]:
"""Get MCP tools with caching to avoid repeated server calls."""
current_time = time.time()
async with self._cache_lock: # Use pre-initialized lock
# Rest of caching logic...
Business Impact
- Reliability: Cache corruption can cause tool execution failures
- Performance: Memory leaks affect long-running agent processes
- Scalability: Race conditions limit concurrent agent capacity
- User Experience: Inconsistent tool availability and execution
Bug Summary
The MCP tools caching mechanism in
ToolCallAgenthas a race condition where multiple concurrent requests can cause cache corruption and memory leaks.Severity: MEDIUM-HIGH
Bug Details
Location
/spoon_ai/agents/toolcall.py_get_cached_mcp_tools()Issue Description
The current caching implementation has several critical issues:
Specific Issues Identified
Lock Creation Race Condition:
hasattrcheck simultaneouslyMemory Leak in Cache Management:
Inefficient Lock Usage:
asyncio.Lock()in the conditional statement_cache_lockif availableImpact
Steps to Reproduce
ToolCallAgentinstances_get_cached_mcp_tools()concurrently from multiple coroutinesExpected Behavior
Current Behavior
Fix Required
Business Impact