This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Install with development dependencies
uv pip install -e .[dev]
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/unit/core/test_config.py
# Run single test
uv run pytest tests/unit/core/test_config.py::TestPydanticConfig::test_config_update
# Linting and formatting
uv run ruff check # Lint check
uv run ruff check --fix # Auto-fix linting issues
uv run black chuck_data tests # Format code
uv run pyright # Type checking
# Run application locally
python -m chuck_data # Or: uv run python -m chuck_data
chuck-data --no-color # Disable colors for testingTests are organized with pytest markers:
- Default: Unit tests only (fast)
pytest -m integration: Integration tests (requires Databricks access)pytest -m data_test: Tests that create Databricks resourcespytest -m e2e: End-to-end tests (slow, comprehensive)
tests/
├── unit/
│ ├── commands/ # Command handler tests
│ ├── clients/ # API client tests
│ ├── ui/ # TUI/display tests
│ └── core/ # Core functionality tests
├── integration/ # Integration tests
└── fixtures/ # Test stubs and fixtures
- TUI (
ui/tui.py) receives user input - Command Registry (
command_registry.py) maps commands to handlers - Service Layer (
service.py) orchestrates business logic - Command Handlers (
commands/) execute specific operations - API Clients (
clients/) interact with external services
ChuckService - Main service facade that:
- Initializes Databricks API client from config
- Routes commands through the command registry
- Handles error reporting and metrics collection
- Acts as bridge between TUI and business logic
Command Registry - Unified registry where each command is defined with:
- Handler function, parameters, and validation rules
- Visibility flags (user vs agent accessible)
- Display preferences (condensed vs full output)
- Interactive input support flags
Configuration System - Pydantic-based config that:
- Supports both file storage (~/.chuck_config.json) and environment variables
- Environment variables use CHUCK_ prefix (e.g., CHUCK_WORKSPACE_URL)
- Handles workspace URLs, tokens, active catalog/schema/model settings
- Includes usage tracking consent management
Agent System - AI-powered assistant that:
- Uses LLM clients (OpenAI-compatible) with configurable models
- Has specialized modes: general queries, PII detection, bulk PII scanning, Stitch setup
- Executes commands through the same registry as TUI
- Maintains conversation history and context
Interactive Context - Session state management for:
- Multi-step command workflows (like setup wizards)
- Command-specific context data
- Cross-command state sharing
Databricks Integration - Primary platform integration:
- Unity Catalog operations (catalogs, schemas, tables, volumes)
- SQL Warehouse management and query execution
- Model serving endpoints for LLM access
- Job management and cluster operations
- Authentication via personal access tokens
Amperity Integration - Data platform operations:
- Authentication flow with browser-based OAuth
- Bug reporting and metrics submission
- Stitch integration for data pipeline setup
Core Principle
Mock external boundaries only. Use real objects for all internal business logic to catch integration bugs.
✅ ALWAYS Mock These (External Boundaries)
HTTP/Network Calls
@patch('databricks.sdk.WorkspaceClient') @patch('requests.get') @patch('requests.post')
@patch('openai.OpenAI')
File System Operations
@patch('builtins.open') @patch('os.path.exists') @patch('os.makedirs') @patch('tempfile.TemporaryDirectory')
@patch('chuck_data.logger.setup_file_logging')
System/Environment
@patch.dict('os.environ', {'CHUCK_TOKEN': 'test'})
@patch('subprocess.run') @patch('datetime.datetime.now') # for deterministic timestamps
User Input/Terminal
@patch('prompt_toolkit.prompt') @patch('readchar.readkey') @patch('sys.stdout.write') # when testing specific output
❌ NEVER Mock These (Internal Logic)
Configuration Objects
@patch('chuck_data.config.ConfigManager')
config_manager = ConfigManager('/tmp/test_config.json')
Business Logic Classes
@patch('chuck_data.service.ChuckService')
service = ChuckService(client=mocked_databricks_client)
Data Objects
@patch('chuck_data.commands.base.CommandResult')
result = CommandResult(success=True, data="test")
Utility Functions
@patch('chuck_data.utils.normalize_workspace_url')
from chuck_data.utils import normalize_workspace_url normalized = normalize_workspace_url("https://test.databricks.com")
Command Registry/Routing
@patch('chuck_data.command_registry.get_command')
from chuck_data.command_registry import get_command command_def = get_command('/status') # Test real routing
Amperity Client
@patch('chuck_data.clients.amperity.AmperityClient')
Use the fixture AmperityClientStub to stub only the external API calls, while using the real command logic.
Databricks Client
@patch('chuck_data.clients.databricks.DatabricksClient')
Use the fixture DatabricksClientStub to stub only the external API calls, while using the real command logic. Fixtures for stubbing external API clients follow the naming convention <ClientName>Stub.
LLM Client
@patch('chuck_data.clients.llm.LLMClient')
Use the fixture LLMClientStub to stub only the external API calls, while using the real command logic.
🎯 Approved Test Patterns
Pattern 1: External Client + Real Internal Logic
def test_list_catalogs_command():
mock_client = DatabricksClientStub() mock_client.add_catalog("test_catalog")
service = ChuckService(client=mock_client)
result = service.execute_command("/list_catalogs")
assert result.success assert "test_catalog" in result.data
Pattern 2: Real Config with Temporary Files
def test_config_update(): with tempfile.NamedTemporaryFile() as tmp: # Use real config manager config_manager = ConfigManager(tmp.name)
# Test real config logic
config_manager.update(workspace_url="https://test.databricks.com")
# Verify real file operations
reloaded = ConfigManager(tmp.name)
assert reloaded.get_config().workspace_url == "https://test.databricks.com"
Pattern 3: Stub Only External APIs
def test_auth_flow():
amperity_stub = AmperityClientStub() amperity_stub.set_auth_completion_failure(True)
result = handle_amperity_login(amperity_stub)
assert not result.success assert "Authentication failed" in result.message
🚫 Red Flags (Stop and Reconsider)
- @patch('chuck_data.config.*')
- @patch('chuck_data.commands..handle_')
- @patch('chuck_data.service.*')
- @patch('chuck_data.utils.*')
- @patch('chuck_data.models.*')
- Any patch of internal business logic functions
✅ Quick Decision Tree
Before mocking anything, ask:
- Does this cross a process boundary? (network, file, subprocess) → Mock it
- Is this user input or system interaction? → Mock it
- Is this internal business logic? → Use real object
- Is this a data transformation? → Use real function
- When in doubt → Use real object
Exception: Only mock internal logic when testing error conditions that are impossible to trigger naturally.
Instruction Set: Writing Behavioral Tests with Agent Coverage
Overview
This guide explains how to write comprehensive behavioral tests for command handlers that support both direct execution and agent interaction via tool_output_callback. The goal is to test user-visible behavior rather than implementation details.
- Analyze the Command Handler First
Step 1.1: Identify Key Behavioral Patterns
Look for these patterns in the command handler:
def handle_command(client, **kwargs): tool_output_callback = kwargs.get("tool_output_callback")
def _report_step(message: str, tool_output_callback=None): if tool_output_callback: tool_output_callback("command-name", {"step": message})
try: # Direct path - no callback direct_result = some_direct_lookup() if direct_result: return success_without_callback() except: # Search path - with callback _report_step("Looking for...", tool_output_callback) search_result = search_function() _report_step("Selecting...", tool_output_callback)
Step 1.2: Map User-Visible Behaviors
Document what users actually see:
Direct Command (no progress): chuck > /select-catalog production Success: Active catalog is now set to 'production'
Agent Exact Match (no progress): chuck > select production catalog → Setting catalog: (Catalog set - Name: production)
Agent Fuzzy Match (with progress): chuck > select prod catalog → Setting catalog: (Looking for catalog matching 'prod') → Setting catalog: (Selecting 'production_data') → Setting catalog: (Catalog set - Name: production_data)
Agent Failure (with progress): chuck > select nonexistent catalog → Setting catalog: (Looking for catalog matching 'nonexistent') Error: No catalog found matching 'nonexistent'. Available catalogs: production, dev
- Test Structure and Naming
Step 2.1: Test Organization
Organize tests into clear sections:
def test_missing_parameter_returns_error(): def test_invalid_parameter_returns_error():
def test_direct_command_success_case(): def test_direct_command_failure_case(): def test_direct_command_edge_case():
def test_agent_exact_match_behavior(): def test_agent_search_behavior(): def test_agent_failure_behavior(): def test_agent_error_handling(): def test_agent_tool_executor_integration():
Step 2.2: Test Naming Convention
DO:
- test_direct_command_selects_existing_catalog - Clear execution path
- test_agent_fuzzy_match_shows_multiple_progress_steps - Behavioral outcome
- test_missing_catalog_parameter_returns_error - Expected behavior
DON'T:
- test_user_gets_success_message - Avoid "user" concept
- test_catalog_selection_works - Too vague
- test_handle_command_with_callback - Implementation focused
- Direct Command Tests
Step 3.1: Basic Test Pattern
def test_direct_command_success_case(client_stub, temp_config): """Direct command description of expected behavior.""" with patch("module.config._config_manager", temp_config): # Setup test data client_stub.add_resource("test_resource")
# Execute command (no tool_output_callback)
result = handle_command(client_stub, parameter="test_resource")
# Verify behavioral outcome
assert result.success
assert "expected success message" in result.message
assert get_active_resource() == "test_resource" # State change
Step 3.2: Failure Cases
def test_direct_command_failure_shows_helpful_error(client_stub, temp_config): """Direct command failure shows error with available options.""" with patch("module.config._config_manager", temp_config): client_stub.add_resource("available_resource")
result = handle_command(client_stub, parameter="missing_resource")
# Verify helpful error behavior
assert not result.success
assert "No resource found matching 'missing_resource'" in result.message
assert "Available resources: available_resource" in result.message
- Agent Tests with tool_output_callback
Step 4.1: Progress Capture Pattern
Always use this exact pattern for capturing agent progress:
def test_agent_behavior_description(client_stub, temp_config): """Agent execution shows expected progress behavior.""" with patch("module.config._config_manager", temp_config): # Setup test data client_stub.add_resource("target_resource")
# Capture progress during agent execution
progress_steps = []
def capture_progress(tool_name, data):
progress_steps.append(f"→ Setting resource: ({data['step']})")
# Execute with tool_output_callback
result = handle_command(
client_stub,
parameter="target_resource",
tool_output_callback=capture_progress
)
# Verify command success
assert result.success
assert get_active_resource() == "target_resource"
# Verify progress behavior
assert len(progress_steps) == expected_count
assert "expected progress message" in progress_steps[0]
Step 4.2: Different Agent Scenarios
Exact Match (No Search): def test_agent_exact_match_shows_no_progress_steps(): # Should have 0 progress steps (direct lookup succeeds) assert len(progress_steps) == 0
Search Required: def test_agent_search_shows_multiple_progress_steps(): # Force search path client_stub.get_resource = lambda name: None
# Should have 2+ progress steps
assert len(progress_steps) >= 2
assert any("Looking for resource matching" in step for step in progress_steps)
assert any("Selecting 'found_resource'" in step for step in progress_steps)
Search Failure: def test_agent_shows_progress_before_failure(): # Should show search attempt before failure assert not result.success assert len(progress_steps) == 1 assert "Looking for resource matching 'nonexistent'" in progress_steps[0]
Step 4.3: Agent Error Handling
def test_agent_callback_errors_bubble_up_as_command_errors(): """Agent callback failures bubble up as command errors (current behavior).""" def failing_callback(tool_name, data): raise Exception("Display system crashed")
result = handle_command(
client_stub,
parameter="trigger_search", # Force callback usage
tool_output_callback=failing_callback
)
# Document current behavior
assert not result.success
assert "Display system crashed" in result.message
Step 4.4: End-to-End Integration
def test_agent_tool_executor_end_to_end_integration(): """Agent tool_executor integration works end-to-end.""" from chuck_data.agent.tool_executor import execute_tool
result = execute_tool(
api_client=client_stub,
tool_name="command-name",
tool_args={"parameter": "test_value"}
)
# Verify agent gets proper result format
assert "expected_field" in result
assert result["expected_field"] == "test_value"
# Verify state actually changed
assert get_active_resource() == "test_value"
- Test Data Setup Patterns
Step 5.1: Client Stub Setup
client_stub.add_catalog("production", catalog_type="MANAGED") client_stub.add_schema("test_schema", catalog="production") client_stub.add_table("test_table", schema="test_schema")
client_stub.get_catalog = lambda name: None # Force search path original_method = client_stub.get_catalog # Save for restoration
Step 5.2: Config Management
Always use this pattern for config tests:
def test_function(client_stub, temp_config): with patch("module.config._config_manager", temp_config): # Test code here pass
- Assertion Patterns
Step 6.1: What TO Test (Behavioral)
assert result.success assert not result.success
assert "Active catalog is now set to 'production'" in result.message assert "No catalog found matching 'missing'" in result.message
assert get_active_catalog() == "production" assert get_active_schema() == "test_schema"
assert len(progress_steps) == 2 assert "Looking for catalog matching 'prod'" in progress_steps[0] assert "Selecting 'production'" in progress_steps[1]
assert "catalog_name" in result assert result["catalog_name"] == "production"
Step 6.2: What NOT to Test (Implementation)
assert result.data["internal_field"] == "value" # ❌
mock_method.assert_called_with("arg") # ❌
assert len(result.data.keys()) == 5 # ❌
assert result.message == "Exact message text" # ❌ (too brittle)
- Checklist for Complete Coverage
Step 7.1: Core Scenarios
- Missing/invalid parameters
- Successful execution (direct command)
- Failure with helpful error (direct command)
- Fuzzy matching (if applicable)
- API error handling
Step 7.2: Agent Scenarios
- Agent exact match (no progress steps)
- Agent search required (multiple progress steps)
- Agent failure (progress before error)
- Agent callback error handling
- Agent tool_executor integration
Step 7.3: Test Quality
- Test names describe behavior, not implementation
- Clear delineation between direct_command_* and agent_* tests
- Comments explain expected behavior, not code mechanics
- No "user" language in test names or descriptions
- Tests focus on observable outcomes
- Example Complete Test File Structure
""" Tests for [command] handler.
Behavioral tests focused on command execution patterns rather than implementation details. """
from unittest.mock import patch from command_module import handle_command from config_module import get_active_resource
def test_missing_parameter_returns_error(): def test_invalid_parameter_returns_error():
def test_direct_command_success_case(): def test_direct_command_failure_case(): def test_direct_command_fuzzy_matching(): def test_databricks_api_errors_handled_gracefully():
def test_agent_exact_match_shows_no_progress_steps(): def test_agent_fuzzy_match_shows_multiple_progress_steps(): def test_agent_shows_progress_before_failure(): def test_agent_callback_errors_bubble_up_as_command_errors(): def test_agent_tool_executor_end_to_end_integration():
-
Final Tips
-
Run tests frequently during development to catch behavioral regressions
-
Use descriptive test data like "production_catalog" instead of "test1"
-
Test both success and failure paths for every major code branch
-
Document current behavior even if it seems suboptimal (helps with future changes)
-
Focus on what users see rather than how the code works internally
This approach ensures comprehensive, maintainable tests that catch behavioral regressions while being resilient to internal implementation changes.