-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Issue: Multiple Critical Failures in AI Town - Timeouts, Generation Mismatches, and Conversation Loops
Summary
The AI Town system is experiencing multiple critical failures that prevent normal operation, including function timeouts, generation number mismatches, dead engine restarts, and agents stuck in infinite conversation exit loops.
Issues Identified
1. Function Execution Timeouts
- Multiple
agentGenerateMessageoperations are timing out at the 600-second maximum duration - Example:
Function execution timed out (maximum duration: 600s) - Affects conversation flow and agent interactions
2. Generation Number Mismatch Errors
- Repeated
ConvexError: {"kind":"generationNumber","message":"Generation number mismatch"}errors - Occurs during
loadWorldandsaveWorldoperations - Suggests synchronization issues between different parts of the system
3. Dead Engine Continuous Restarts
- Engine
ks78teh5gnmjpbf1cd60m3wehx7hwm8bis repeatedly dying and being restarted - Log shows:
'Restarting dead engine ks78teh5gnmjpbf1cd60m3wehx7hwm8b...'appearing every minute - Indicates underlying stability issues with the game engine
4. Conversation Exit Loop
- AI agents (Alice/Pete, Bob/Kurt, Lucky/Stella) are stuck in infinite loops trying to leave conversations
- Agents repeatedly exchange farewell messages but never successfully exit
- Example: The conversation history shows 20+ farewell messages between the same agents
5. Unawaited Async Operations
- Warning:
1 unawaited operation: [fetch]. Async operations should be awaited or they might not run - Could lead to incomplete operations and data inconsistency
6. Performance Issues
- Even successful operations take 500-580 seconds to complete
- Example:
Function execution took a long time. (maximum duration: 600s, actual duration: 578.114142708s)
Expected Behavior
- Agents should be able to complete conversations and exit gracefully
- Function executions should complete within reasonable time limits
- The engine should remain stable without constant restarts
- Generation numbers should remain synchronized
Actual Behavior
- Agents get stuck in conversation exit loops
- Functions timeout after 600 seconds
- Engine crashes and restarts continuously
- Generation number mismatches prevent proper world state saves
Environment
- Convex backend
- Model:
dolphin3:latest - Multiple concurrent agent operations
Potential Root Causes
- Inefficient message generation causing timeouts
- State synchronization issues between different system components
- Memory leaks or resource exhaustion causing engine crashes
- Improper async operation handling
- Conversation state management not properly tracking exit conditions
Suggested Fixes
- Implement proper conversation state management to prevent exit loops
- Optimize agent message generation to reduce execution time
- Fix async operation handling to ensure all operations are properly awaited
- Investigate and fix the root cause of engine crashes
- Implement better synchronization for generation numbers
- Add timeout handling and retry logic for long-running operations
Severity
Critical - The system is essentially non-functional with agents unable to properly interact and the engine constantly crashing.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels