feat: Make all potentially blocking initialization async#152
Conversation
…ion operations Added elapsed time logging for controller client subscription operations in CorePlusSessionFactory and CombinedSessionRegistry to help diagnose performance issues. Changed several debug-level logs to info-level for better visibility of subscription lifecycle events. This improves observability when troubleshooting slow startup or initialization issues with persistent query state subscriptions.
…n registry operations Added InitializationPhase enum and RegistrySnapshot dataclass to track registry initialization lifecycle and surface errors to clients. Updated sessions_list and enterprise_systems_status to return initialization status when discovery is in progress or completed with errors. Improved error handling by replacing KeyError with RegistryItemNotFoundError for missing sessions. Added error logging before raising
…t constants Renamed timeout parameters to `timeout_seconds` across all client methods for consistency. Added `_constants.py` module with centralized timeout constants organized by operation type (connection, authentication, PQ operations, etc.). Updated all async operations to use `asyncio.wait_for()` with configurable timeouts and proper timeout error handling. Changed default timeout behavior from `None` to explicit constant
Replaced `asyncio.TimeoutError` with `TimeoutError` across all async timeout handlers for consistency with Python 3.11+ best practices. Reorganized imports to follow project conventions (stdlib, third-party, local with constants before other modules). Removed unused imports from test files and resource manager module.
…nd correct examples Replaced references to `CorePlusSessionManager` with `CorePlusSessionFactory` in docstrings and examples across authentication and session factory modules. Updated code examples to use async factory creation pattern (`await CorePlusSessionFactory.from_url()`), correct authentication flow, and async table operations (`await session.empty_table(10)`). Added comprehensive timeout constant documentation explaining
… implementations Updated tool listings in DEVELOPER_GUIDE.md to match actual implementations: - Added `enterprise_systems_status` to session_enterprise tools - Renamed `catalog_table_data` to `catalog_table_sample` and added `catalog_namespaces_list` to catalog tools - Updated pq tools list: added `pq_name_to_id` and `pq_details`, renamed `pq_get` to `pq_details`, `pq_update` to `pq_modify`, and removed obsolete `pq_get`
Added timeout tests for `delete_query()`, `add_query()`, and `modify_query()` methods to verify they raise `DeephavenConnectionError` with appropriate timeout messages when operations exceed the specified timeout threshold.
Applied Black code formatter across multiple modules to ensure consistent code style. Changes include: - Multi-line function parameter formatting with trailing commas - Consistent line breaking for long function calls and definitions - Import statement reordering (moved RegistryItemNotFoundError imports before other local imports) - Extracted auth token and password resolution logic into static helper methods (_resolve_auth_token in CoreSession
…move unnecessary cast Added `# type: ignore[arg-type]` comments for `time_table()` and `input_table()` parameters where pydeephaven type hints incorrectly omit `None` as a valid type. Removed unnecessary `cast()` call in `_discover_enterprise_sessions()` since `_enterprise_registry` is already typed correctly. Added `# type: ignore[unreachable]` for early return in `close()` that mypy incorrectly flags as unreachable due to concurrent
There was a problem hiding this comment.
Pull request overview
This pull request introduces comprehensive improvements to the Deephaven MCP client architecture, focusing on three main areas: (1) standardizing timeout handling across all potentially blocking operations with a new constants module, (2) implementing non-blocking enterprise session discovery with progressive availability, and (3) enhancing error reporting with detailed initialization status tracking. The changes transform the registry initialization from a blocking operation that waits for all enterprise systems to a fast two-phase approach where community sessions are immediately available while enterprise sessions are discovered in parallel in the background.
Changes:
- Introduced
_constants.pymodule with standardized timeout constants (60s connection, 30s quick ops, 120s SAML/wait operations) - Implemented non-blocking initialization pattern using
InitializationPhaseenum andRegistrySnapshotdataclass to track discovery state - Added timeout parameters with
asyncio.wait_for()wrappers to all blocking client methods (ping, subscribe, authentication, worker creation, persistent query operations) - Enhanced error messages with initialization context (discovery in-progress warnings, targeted factory errors) for better developer experience
- Updated all registry consumers to use
RegistrySnapshot.itemsinstead of raw dict returns
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/deephaven_mcp/client/_constants.py |
New module defining timeout constants for all network operations |
src/deephaven_mcp/client/_controller_client.py |
Added timeout parameters to ping, subscribe, add_query, delete_query, modify_query methods |
src/deephaven_mcp/client/_session_factory.py |
Added timeout handling to from_url, from_config, authentication methods, worker creation; refactored password resolution |
src/deephaven_mcp/client/_session.py |
Added timeout to from_config; refactored auth token resolution |
src/deephaven_mcp/client/_auth_client.py |
Renamed timeout parameter to timeout_seconds for consistency |
src/deephaven_mcp/client/_base.py |
Added error logging before raising InternalError for enterprise unavailability |
src/deephaven_mcp/resource_manager/_registry.py |
Added InitializationPhase enum and RegistrySnapshot dataclass; changed get_all() return type |
src/deephaven_mcp/resource_manager/_registry_combined.py |
Implemented non-blocking initialization with background discovery task; added initialization status tracking |
src/deephaven_mcp/resource_manager/__init__.py |
Exported InitializationPhase and RegistrySnapshot |
src/deephaven_mcp/mcp_systems_server/_tools/shared.py |
Added _format_initialization_status() helper for consistent status formatting |
src/deephaven_mcp/mcp_systems_server/_tools/session_enterprise.py |
Changed KeyError to RegistryItemNotFoundError; added initialization status to enterprise_systems_status |
src/deephaven_mcp/mcp_systems_server/_tools/session_community.py |
Changed KeyError to RegistryItemNotFoundError; fixed session existence check |
src/deephaven_mcp/mcp_systems_server/_tools/session.py |
Added initialization status to sessions_list response |
tests/resource_manager/test_init.py |
Updated imports for InitializationPhase and RegistrySnapshot |
tests/resource_manager/test__registry_combined.py |
Extensive test additions for non-blocking initialization, discovery phases, error handling |
tests/resource_manager/test__registry.py |
Added RegistrySnapshot tests; updated get_all() tests |
tests/mcp_systems_server/_tools/test_*.py |
Updated mocks to use RegistrySnapshot.simple(); changed KeyError to RegistryItemNotFoundError |
tests/client/test__*.py |
Added comprehensive timeout test coverage for all async operations |
tests/client/test__constants.py |
New test file validating timeout constant types and relationships |
docs/DEVELOPER_GUIDE.md |
Updated tool listings to match current implementation |
Changed `# type: ignore[arg-type]` to `# type: ignore[arg-type,unused-ignore]` for `time_table()` and `input_table()` parameters to suppress mypy warnings about unused ignores when pydeephaven type hints are fixed. Removed redundant inline comments explaining the type ignore since the issue is documented elsewhere.
Reduced sleep durations in timeout test mocks from 0.5 seconds to 0.05 seconds to make tests run faster while still reliably triggering timeout conditions. This change affects timeout tests across controller client, session, and session factory test modules.
…stead of test classes Converted test classes `TestRemainingEdgeCases` and `TestSessionCommunityCreateComplete` to standalone test functions with descriptive names following pytest conventions. Renamed test methods to use `test_session_community_` prefix for clarity (e.g., `test_create_with_pip_and_process_id` → `test_session_community_create_with_pip_and_process_id`). Added section headers to organize edge case tests. No functional
…ed registry Removed duplicate `subscribe()` call from `_get_or_create_controller_client()` since controller clients are already subscribed during factory creation. Updated test mocks and comments to reflect that subscription happens at factory level, not during client retrieval. This eliminates unnecessary subscription overhead when accessing cached controller clients.
Added test coverage for defensive checks that verify internal consistency when `_initialized` flag is True but `_enterprise_registry` is None. Tests include: - `test_enterprise_registry_none_after_initialized_flag()` - validates InternalError when accessing enterprise_registry() with inconsistent state - `test_update_enterprise_sessions_single_factory_registry_none()` - validates InternalError when updating single factory with incons
…sh documentation Added programming practice rule requiring American English spelling throughout codebase. Enhanced documentation for enterprise refresh operations with detailed locking contract, four-phase algorithm description, and module-level pure I/O function documentation. Added validation for `wait_for_change_from_version()` timeout parameter to prevent undefined behavior with zero values. Refactored logging statements to use f
Applied Black code formatter to registry modules for consistent multi-line string formatting. Changes include: - Reformatted long f-string logging statements with trailing parentheses - Improved docstring parameter formatting with type annotations on separate lines - Removed trailing blank lines in test files - Consolidated multi-line conditional expressions with consistent indentation No functional changes - purely formatting improvements
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…y operations Reduced default timeout values for faster operation failures: - SUBSCRIBE_TIMEOUT_SECONDS: 60s → 30s - QUICK_OPERATION_TIMEOUT_SECONDS: 30s → 5s Added detailed timing instrumentation to registry operations: - `_fetch_factory_pqs()`: Added timing logs for client creation, ping checks, client recreation, and map() calls - `session_details()`: Added timing logs for session manager retrieval and liveness checks - `_
…enterprise registry sync Added comprehensive debug logging throughout PQ tool operations to track: - Enterprise factory connection timing (before/after `factory_manager.get()`) - Controller operations (PQ map fetches, serial lookups, start/stop/restart calls) - PQ counts returned from controller map() calls - Batch operation timeouts and completion Added parallel query timing instrumentation to `_sync_enterprise_sessions()`: - Log
…contract documentation Enhanced `_format_initialization_status()` to provide more specific status messages for each initialization phase: - FAILED: Reports critical failure with partial/no data instead of in-progress message - NOT_STARTED/PARTIAL: Reports discovery has not yet started - LOADING: Reports discovery is actively running - SIMPLE/COMPLETED: Only reports status when errors are present Updated locking contract documentation in
Removed trailing blank line after imports section to maintain consistent formatting with project style guidelines.
This pull request introduces several improvements and refactorings to the Deephaven MCP client, with a focus on standardizing timeout handling, improving error reporting, and updating documentation and examples for clarity. The changes enhance consistency and robustness in client-server interactions, especially around timeouts and error conditions.
Timeout Standardization and Usage
_constants.pymodule to define all timeout values in seconds for various operations (connection, authentication, persistent queries, etc.), improving consistency and maintainability across the codebase.CorePlusControllerClientmethods (ping,subscribe,get_serial_for_name,get) to use these standardized timeout constants and added timeout handling viaasyncio.wait_for, raising clear errors on timeout. (F1ed4eeeL1R1, [1] [2] [3] [4] [5] [6] [7] [8]Error Reporting Improvements
map,map_and_version, orget_serial_for_namebeforesubscribe), helping developers identify and fix programming errors more easily. [1] [2] [3]Documentation and Example Updates
_auth_client.pyto referenceCorePlusSessionFactoryinstead of the deprecatedCorePlusSessionManager, and improved parameter naming for clarity. [1] [2] [3] [4] [5] [6] [7] [8]Developer Guide and Tooling Updates