Skip to content

Comments

feat: Make all potentially blocking initialization async#152

Merged
chipkent merged 23 commits intomainfrom
async_init
Feb 23, 2026
Merged

feat: Make all potentially blocking initialization async#152
chipkent merged 23 commits intomainfrom
async_init

Conversation

@chipkent
Copy link
Member

This pull request introduces several improvements and refactorings to the Deephaven MCP client, with a focus on standardizing timeout handling, improving error reporting, and updating documentation and examples for clarity. The changes enhance consistency and robustness in client-server interactions, especially around timeouts and error conditions.

Timeout Standardization and Usage

  • Introduced a new _constants.py module to define all timeout values in seconds for various operations (connection, authentication, persistent queries, etc.), improving consistency and maintainability across the codebase.
  • Updated CorePlusControllerClient methods (ping, subscribe, get_serial_for_name, get) to use these standardized timeout constants and added timeout handling via asyncio.wait_for, raising clear errors on timeout. (F1ed4eeeL1R1, [1] [2] [3] [4] [5] [6] [7] [8]

Error Reporting Improvements

  • Added explicit logging and error messages when required preconditions are not met (e.g., calling map, map_and_version, or get_serial_for_name before subscribe), helping developers identify and fix programming errors more easily. [1] [2] [3]
  • Enhanced the base client object wrapper to log an error before raising if enterprise features are requested but unavailable.

Documentation and Example Updates

  • Updated documentation and usage examples in _auth_client.py to reference CorePlusSessionFactory instead of the deprecated CorePlusSessionManager, and improved parameter naming for clarity. [1] [2] [3] [4] [5] [6] [7] [8]

Developer Guide and Tooling Updates

  • Updated the developer guide to reflect new and renamed tools and endpoints, ensuring documentation matches the current implementation.

…ion operations

Added elapsed time logging for controller client subscription operations in CorePlusSessionFactory and CombinedSessionRegistry to help diagnose performance issues. Changed several debug-level logs to info-level for better visibility of subscription lifecycle events. This improves observability when troubleshooting slow startup or initialization issues with persistent query state subscriptions.
…n registry operations

Added InitializationPhase enum and RegistrySnapshot dataclass to track registry initialization lifecycle and surface errors to clients. Updated sessions_list and enterprise_systems_status to return initialization status when discovery is in progress or completed with errors. Improved error handling by replacing KeyError with RegistryItemNotFoundError for missing sessions. Added error logging before raising
…t constants

Renamed timeout parameters to `timeout_seconds` across all client methods for consistency. Added `_constants.py` module with centralized timeout constants organized by operation type (connection, authentication, PQ operations, etc.). Updated all async operations to use `asyncio.wait_for()` with configurable timeouts and proper timeout error handling. Changed default timeout behavior from `None` to explicit constant
Replaced `asyncio.TimeoutError` with `TimeoutError` across all async timeout handlers for consistency with Python 3.11+ best practices. Reorganized imports to follow project conventions (stdlib, third-party, local with constants before other modules). Removed unused imports from test files and resource manager module.
…nd correct examples

Replaced references to `CorePlusSessionManager` with `CorePlusSessionFactory` in docstrings and examples across authentication and session factory modules. Updated code examples to use async factory creation pattern (`await CorePlusSessionFactory.from_url()`), correct authentication flow, and async table operations (`await session.empty_table(10)`). Added comprehensive timeout constant documentation explaining
… implementations

Updated tool listings in DEVELOPER_GUIDE.md to match actual implementations:
- Added `enterprise_systems_status` to session_enterprise tools
- Renamed `catalog_table_data` to `catalog_table_sample` and added `catalog_namespaces_list` to catalog tools
- Updated pq tools list: added `pq_name_to_id` and `pq_details`, renamed `pq_get` to `pq_details`, `pq_update` to `pq_modify`, and removed obsolete `pq_get`
Added timeout tests for `delete_query()`, `add_query()`, and `modify_query()` methods to verify they raise `DeephavenConnectionError` with appropriate timeout messages when operations exceed the specified timeout threshold.
Applied Black code formatter across multiple modules to ensure consistent code style. Changes include:
- Multi-line function parameter formatting with trailing commas
- Consistent line breaking for long function calls and definitions
- Import statement reordering (moved RegistryItemNotFoundError imports before other local imports)
- Extracted auth token and password resolution logic into static helper methods (_resolve_auth_token in CoreSession
…move unnecessary cast

Added `# type: ignore[arg-type]` comments for `time_table()` and `input_table()` parameters where pydeephaven type hints incorrectly omit `None` as a valid type. Removed unnecessary `cast()` call in `_discover_enterprise_sessions()` since `_enterprise_registry` is already typed correctly. Added `# type: ignore[unreachable]` for early return in `close()` that mypy incorrectly flags as unreachable due to concurrent
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces comprehensive improvements to the Deephaven MCP client architecture, focusing on three main areas: (1) standardizing timeout handling across all potentially blocking operations with a new constants module, (2) implementing non-blocking enterprise session discovery with progressive availability, and (3) enhancing error reporting with detailed initialization status tracking. The changes transform the registry initialization from a blocking operation that waits for all enterprise systems to a fast two-phase approach where community sessions are immediately available while enterprise sessions are discovered in parallel in the background.

Changes:

  • Introduced _constants.py module with standardized timeout constants (60s connection, 30s quick ops, 120s SAML/wait operations)
  • Implemented non-blocking initialization pattern using InitializationPhase enum and RegistrySnapshot dataclass to track discovery state
  • Added timeout parameters with asyncio.wait_for() wrappers to all blocking client methods (ping, subscribe, authentication, worker creation, persistent query operations)
  • Enhanced error messages with initialization context (discovery in-progress warnings, targeted factory errors) for better developer experience
  • Updated all registry consumers to use RegistrySnapshot.items instead of raw dict returns

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/deephaven_mcp/client/_constants.py New module defining timeout constants for all network operations
src/deephaven_mcp/client/_controller_client.py Added timeout parameters to ping, subscribe, add_query, delete_query, modify_query methods
src/deephaven_mcp/client/_session_factory.py Added timeout handling to from_url, from_config, authentication methods, worker creation; refactored password resolution
src/deephaven_mcp/client/_session.py Added timeout to from_config; refactored auth token resolution
src/deephaven_mcp/client/_auth_client.py Renamed timeout parameter to timeout_seconds for consistency
src/deephaven_mcp/client/_base.py Added error logging before raising InternalError for enterprise unavailability
src/deephaven_mcp/resource_manager/_registry.py Added InitializationPhase enum and RegistrySnapshot dataclass; changed get_all() return type
src/deephaven_mcp/resource_manager/_registry_combined.py Implemented non-blocking initialization with background discovery task; added initialization status tracking
src/deephaven_mcp/resource_manager/__init__.py Exported InitializationPhase and RegistrySnapshot
src/deephaven_mcp/mcp_systems_server/_tools/shared.py Added _format_initialization_status() helper for consistent status formatting
src/deephaven_mcp/mcp_systems_server/_tools/session_enterprise.py Changed KeyError to RegistryItemNotFoundError; added initialization status to enterprise_systems_status
src/deephaven_mcp/mcp_systems_server/_tools/session_community.py Changed KeyError to RegistryItemNotFoundError; fixed session existence check
src/deephaven_mcp/mcp_systems_server/_tools/session.py Added initialization status to sessions_list response
tests/resource_manager/test_init.py Updated imports for InitializationPhase and RegistrySnapshot
tests/resource_manager/test__registry_combined.py Extensive test additions for non-blocking initialization, discovery phases, error handling
tests/resource_manager/test__registry.py Added RegistrySnapshot tests; updated get_all() tests
tests/mcp_systems_server/_tools/test_*.py Updated mocks to use RegistrySnapshot.simple(); changed KeyError to RegistryItemNotFoundError
tests/client/test__*.py Added comprehensive timeout test coverage for all async operations
tests/client/test__constants.py New test file validating timeout constant types and relationships
docs/DEVELOPER_GUIDE.md Updated tool listings to match current implementation

Changed `# type: ignore[arg-type]` to `# type: ignore[arg-type,unused-ignore]` for `time_table()` and `input_table()` parameters to suppress mypy warnings about unused ignores when pydeephaven type hints are fixed. Removed redundant inline comments explaining the type ignore since the issue is documented elsewhere.
Reduced sleep durations in timeout test mocks from 0.5 seconds to 0.05 seconds to make tests run faster while still reliably triggering timeout conditions. This change affects timeout tests across controller client, session, and session factory test modules.
…stead of test classes

Converted test classes `TestRemainingEdgeCases` and `TestSessionCommunityCreateComplete` to standalone test functions with descriptive names following pytest conventions. Renamed test methods to use `test_session_community_` prefix for clarity (e.g., `test_create_with_pip_and_process_id` → `test_session_community_create_with_pip_and_process_id`). Added section headers to organize edge case tests. No functional
…ed registry

Removed duplicate `subscribe()` call from `_get_or_create_controller_client()` since controller clients are already subscribed during factory creation. Updated test mocks and comments to reflect that subscription happens at factory level, not during client retrieval. This eliminates unnecessary subscription overhead when accessing cached controller clients.
Added test coverage for defensive checks that verify internal consistency when `_initialized` flag is True but `_enterprise_registry` is None. Tests include:
- `test_enterprise_registry_none_after_initialized_flag()` - validates InternalError when accessing enterprise_registry() with inconsistent state
- `test_update_enterprise_sessions_single_factory_registry_none()` - validates InternalError when updating single factory with incons
…sh documentation

Added programming practice rule requiring American English spelling throughout codebase. Enhanced documentation for enterprise refresh operations with detailed locking contract, four-phase algorithm description, and module-level pure I/O function documentation. Added validation for `wait_for_change_from_version()` timeout parameter to prevent undefined behavior with zero values. Refactored logging statements to use f
Applied Black code formatter to registry modules for consistent multi-line string formatting. Changes include:
- Reformatted long f-string logging statements with trailing parentheses
- Improved docstring parameter formatting with type annotations on separate lines
- Removed trailing blank lines in test files
- Consolidated multi-line conditional expressions with consistent indentation

No functional changes - purely formatting improvements
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated 2 comments.

chipkent and others added 3 commits February 20, 2026 15:35
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…y operations

Reduced default timeout values for faster operation failures:
- SUBSCRIBE_TIMEOUT_SECONDS: 60s → 30s
- QUICK_OPERATION_TIMEOUT_SECONDS: 30s → 5s

Added detailed timing instrumentation to registry operations:
- `_fetch_factory_pqs()`: Added timing logs for client creation, ping checks, client recreation, and map() calls
- `session_details()`: Added timing logs for session manager retrieval and liveness checks
- `_
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 1 comment.

…enterprise registry sync

Added comprehensive debug logging throughout PQ tool operations to track:
- Enterprise factory connection timing (before/after `factory_manager.get()`)
- Controller operations (PQ map fetches, serial lookups, start/stop/restart calls)
- PQ counts returned from controller map() calls
- Batch operation timeouts and completion

Added parallel query timing instrumentation to `_sync_enterprise_sessions()`:
- Log
…contract documentation

Enhanced `_format_initialization_status()` to provide more specific status messages for each initialization phase:
- FAILED: Reports critical failure with partial/no data instead of in-progress message
- NOT_STARTED/PARTIAL: Reports discovery has not yet started
- LOADING: Reports discovery is actively running
- SIMPLE/COMPLETED: Only reports status when errors are present

Updated locking contract documentation in
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 30 changed files in this pull request and generated no new comments.

Removed trailing blank line after imports section to maintain consistent formatting with project style guidelines.
@chipkent chipkent merged commit 1bfeadb into main Feb 23, 2026
20 checks passed
@chipkent chipkent deleted the async_init branch February 23, 2026 21:45
@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant