WebQA Agent Custom Tool Development - LLM Context Document

DOCUMENT TYPE: LLM Context / System Prompt AUDIENCE: Large Language Models (Claude, GPT-4, Gemini, etc.) PURPOSE: Provide complete project context for AI-assisted custom tool development VERSION: 0.1.0 LAST_UPDATED: 2025-12-31

PROJECT CONTEXT

What is WebQA Agent?

WebQA Agent is an autonomous web browser testing framework using AI-powered agents.

Architecture: LangGraph-based workflow orchestration
Browser Automation: Playwright async API
AI Models: OpenAI, Anthropic, Google Gemini support
Custom Tools: Extensible testing capabilities via tool registry

System Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Custom Tool Class                        │
│                  (YourTool extends WebQABaseTool)               │
└────────────────────────────────┬────────────────────────────────┘
                                 │ inherits
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                         WebQABaseTool                           │
│  - format_success(), format_failure(), format_critical_error()  │
│  - update_action_context(), get_execution_context()             │
└────────────────────────────────┬────────────────────────────────┘
                                 │ uses
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                          ResponseTags                           │
│    [SUCCESS], [FAILURE], [CRITICAL_ERROR:TYPE], [WARNING]       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    @register_tool decorator                     │
└────────────────────────────────┬────────────────────────────────┘
                                 │ registers to
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ToolRegistry (Singleton)                     │
└────────────────────────────────┬────────────────────────────────┘
                                 │ provides tools to
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                       LangGraph Workflow                        │
│                    (graph.py orchestration)                     │
└────────────────────────────────┬────────────────────────────────┘
                                 │ executes
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                       Tool._arun(async)                         │
└────────────────────────────────┬────────────────────────────────┘
                                 │ updates
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                 ui_tester.last_action_context                   │
└────────────────────────────────┬────────────────────────────────┘
                                 │ consumed by
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                        Assertion Tools                          │
└─────────────────────────────────────────────────────────────────┘

                ┌─────────────────────────────────────────┐
                │        case_recorder.add_step()         │
                │             (HTML Report)               │
                └─────────────────────────────────────────┘

Critical Files

webqa_agent/tools/base.py - Base classes, response tags, metadata
webqa_agent/tools/registry.py - Singleton registry, tool discovery
webqa_agent/executor/gen/graph.py - LangGraph workflow orchestration
webqa_agent/executor/gen/agents/execute_agent.py - Tool execution and control flow
webqa_agent/tools/action_tool.py - Browser interaction reference
webqa_agent/tools/custom/link_check_tool.py - Custom tool example

Project Directory Structure

webqa_agent/
├── testers/
│   └── case_gen/
│       ├── tools/
│       │   ├── base.py              # WebQABaseTool, WebQAToolMetadata, ResponseTags
│       │   ├── registry.py          # ToolRegistry singleton, @register_tool
│       │   ├── action_tool.py  # Browser interaction patterns (UITool)
│       │   ├── ux_tool.py          # UX testing tools
│       │   ├── custom/              # ← YOUR CUSTOM TOOLS HERE
│       │   │   ├── __init__.py
│       │   │   ├── link_check_tool.py  # Example custom tool
│       │   │   └── {{your_tool}}.py        # Place your tool here
│       │   └── __init__.py
│       ├── graph.py                # LangGraph workflow orchestration
│       ├── agents/
│       │   └── execute_agent.py    # Tool execution and control flow
│       └── state/
│           └── schemas.py          # State management schemas
├── browser/
│   └── session.py                  # Browser session pool management
├── llm/
│   └── llm_api.py                  # Multi-provider LLM client
└── actions/
    └── action_handler.py           # Browser action execution

tests/
└── custom_tools/                   # ← YOUR TESTS HERE
    ├── __init__.py
    └── test_{{your_tool}}.py       # Place your tests here

config/
└── config.yaml                     # Main configuration file

Key Locations:

Custom Tools: webqa_agent/tools/custom/
Tests: tests/custom_tools/
Config: config/config.yaml

MANDATORY CONSTRAINTS

Hard Requirements (MUST follow)

Inheritance: All tools MUST inherit from WebQABaseTool
Decorator: All tools MUST use @register_tool decorator
Async Execution: All tools MUST implement async def _arun() (NOT sync def _run())
Response Tags: All returns MUST include one of:
- [SUCCESS] - Continue to next step
- [FAILURE] - Trigger adaptive recovery
- [CRITICAL_ERROR:TYPE] - Abort test immediately
- [WARNING] - Non-blocking issue
- [CANNOT_VERIFY] - Verification prerequisite failed

Response Tag Types

CRITICAL_ERROR Types (cause immediate abort):

ELEMENT_NOT_FOUND - Element not found/inaccessible
NAVIGATION_FAILED - Page navigation failed
PERMISSION_DENIED - Access denied
PAGE_CRASHED - Browser crashed
NETWORK_ERROR - Network issues
SESSION_EXPIRED - Authentication expired
UNSUPPORTED_PAGE - PDF/plugin pages
VALIDATION_ERROR - Form validation failed

File Location Rules

Custom tools: webqa_agent/tools/custom/your_tool.py
Tests: tests/custom_tools/test_your_tool.py
Config: config/config.yaml (for test configuration)

Naming Conventions (STRICT)

Class Name: PascalCaseTool (e.g., TitleCheckerTool)
File Name: snake_case_tool.py (e.g., title_checker.py)
Tool Name (in metadata): snake_case (e.g., check_page_title)
Step Type: snake_case or custom_xxx (e.g., custom_api_test)

CODE TEMPLATES

Minimal Working Tool Template

"""
File: webqa_agent/tools/custom/{{TOOL_NAME_SNAKE}}.py

{{TOOL_DESCRIPTION}}
"""
from typing import Any, Type
from pydantic import BaseModel, Field
from webqa_agent.tools.base import (
    WebQABaseTool,
    WebQAToolMetadata,
)
from webqa_agent.tools.registry import register_tool

# Step 1: Define parameter schema
class {{TOOL_NAME_PASCAL}}Schema(BaseModel):
    {{PARAM_NAME}}: {{PARAM_TYPE}} = Field(
        description="{{PARAM_DESCRIPTION}}"
    )
    # Add more parameters as needed

# Step 2: Register and define tool class
@register_tool
class {{TOOL_NAME_PASCAL}}Tool(WebQABaseTool):
    """{{TOOL_DESCRIPTION}}"""

    name: str = "{{TOOL_NAME_SNAKE}}"
    description: str = "{{BRIEF_DESCRIPTION}}"
    args_schema: Type[BaseModel] = {{TOOL_NAME_PASCAL}}Schema

    # Required for browser access
    ui_tester_instance: Any = Field(...)

    # Step 3: Define metadata for registration
    @classmethod
    def get_metadata(cls) -> WebQAToolMetadata:
        return WebQAToolMetadata(
            name="{{TOOL_NAME_SNAKE}}",
            category="custom",  # Options: action, assertion, ux, custom
            step_type="{{TOOL_NAME_SNAKE}}",
            description_short="{{ONE_LINE_DESCRIPTION}}",
            description_long="{{DETAILED_DESCRIPTION}}",
            examples=[
                '{{"action": "{{TOOL_NAME_SNAKE}}", "params": {{"{{PARAM_NAME}}": "value"}}}}',
            ],
            use_when=[
                "{{SCENARIO_1}}",
                "{{SCENARIO_2}}",
            ],
            dont_use_when=[
                "{{ANTI_PATTERN_1}}",
            ],
            priority=55,  # 1-100, core tools: 70-90, custom: 30-60
            dependencies=[],  # e.g., ["aiohttp", "beautifulsoup4"]
        )

    # Step 4: Implement async execution logic
    async def _arun(
        self,
        {{PARAM_NAME}}: {{PARAM_TYPE}},
        # Add more parameters matching schema
    ) -> str:
        """Execute tool logic and return response with tag."""
        try:
            # Get browser page if needed
            page = await self.ui_tester_instance.get_current_page()

            # Implement your logic here
            result = await self._execute_logic({{PARAM_NAME}})

            # Update context for downstream tools (RECOMMENDED for action tools)
            self.update_action_context(
                self.ui_tester_instance,
                {
                    'description': f'Executed {{TOOL_NAME_SNAKE}}',
                    'action_type': '{{TOOL_NAME_PASCAL}}',
                    'status': 'success',
                    'result': result,
                    'timestamp': __import__('datetime').datetime.now().isoformat(),
                }
            )

            # Return success with tag
            return self.format_success(f"Operation completed: {result}")

        except Exception as e:
            # For recoverable errors
            return self.format_failure(
                f"Operation failed: {str(e)}",
                recovery_hints=[
                    "Try alternative approach",
                    "Check prerequisites",
                ]
            )

Advanced Features Template

Context Management

async def _arun(self, param: str) -> str:
    # Execute action
    result = await self._perform_action(param)

    # Update context for subsequent tools
    self.update_action_context(
        self.ui_tester_instance,
        {
            'description': 'Action description',
            'action_type': 'MyAction',
            'status': 'success',
            'result': {
                'message': 'Success',
                'data': result  # Key data for next tools
            },
            'timestamp': __import__('datetime').datetime.now().isoformat(),
        }
    )

    return self.format_success("Done")

# In subsequent assertion tools:
async def _arun(self, ...):
    context = self.get_execution_context(self.ui_tester_instance)
    if context:
        previous_data = context['last_action']['result']['data']
        # Use previous_data for validation

case_recorder Integration (for HTML reports)

async def _arun(self, param: str) -> str:
    import json

    result = await self._execute(param)

    # Record step in HTML report
    if self.case_recorder:
        self.case_recorder.add_step(
            description=f"Custom operation: {param}",
            screenshots=[],  # Optional
            model_io=json.dumps({
                'input': param,
                'output': result,
                'metadata': {...}
            }, ensure_ascii=False),
            actions=[],
            status='passed',
            step_type='action',
        )

    return self.format_success(f"Result: {result}")

Accessing LLM Config and Other Context

@classmethod
def get_required_params(cls) -> Dict[str, str]:
    """Declare required initialization parameters."""
    return {
        'ui_tester_instance': 'ui_tester_instance',
        'llm_config': 'llm_config',  # Access LLM configuration
        'case_recorder': 'case_recorder',  # Access case recorder
    }

async def _arun(self, param: str) -> str:
    # Now can access self.llm_config
    model_name = self.llm_config.get('model', 'gpt-4')
    # Adjust behavior based on model

ERROR PATTERNS TO AVOID

❌ WRONG: Missing Response Tag

async def _arun(self, param: str):
    return "Operation completed"  # Missing [SUCCESS] tag

Symptom: Test hangs, execute_agent can't determine success/failure Fix: Use self.format_success("Operation completed")

❌ WRONG: Using Sync Method

def _run(self, param: str):  # Sync method
    return self.format_success("Done")

Symptom: NotImplementedError: Sync execution not supported Fix: Use async def _arun(self, param: str):

❌ WRONG: Not Using @register_tool

class MyTool(WebQABaseTool):  # No decorator
    ...

Symptom: Tool not discovered, not available in workflow Fix: Add @register_tool before class definition

❌ WRONG: JSON Serialization Error

if self.case_recorder:
    self.case_recorder.add_step(
        model_io={'exception': some_exception_obj}  # Not JSON serializable
    )

Symptom: TypeError: Object of type Exception is not JSON serializable Fix: Convert to string: model_io=json.dumps({'error': str(exception)}, ensure_ascii=False)

❌ WRONG: Undeclared Dependencies

import requests  # Used but not declared

@classmethod
def get_metadata(cls):
    return WebQAToolMetadata(
        name="my_tool",
        dependencies=[]  # Missing 'requests'
    )

Symptom: ModuleNotFoundError for users Fix: Declare in metadata: dependencies=["requests"]

❌ WRONG: Not Updating Context (for action tools)

async def _arun(self, param: str):
    result = self._process(param)
    return self.format_success("Done")
    # Missing update_action_context() call

Symptom: Subsequent assertion tools show "Status: UNKNOWN", can't verify previous action Fix: Call self.update_action_context() after successful execution

TESTING REQUIREMENTS

Verification Checklist

Syntax Check:

python -m py_compile webqa_agent/tools/custom/my_tool.py

Registration Check:

from webqa_agent.tools.registry import get_registry
assert 'my_tool' in get_registry().get_tool_names()

Unit Tests (pytest):

# tests/custom_tools/test_my_tool.py
import pytest
from webqa_agent.tools.custom.my_tool import MyTool

@pytest.mark.asyncio
async def test_my_tool_success():
    tool = MyTool(ui_tester_instance=mock_tester)
    result = await tool._arun(param="test")
    assert "[SUCCESS]" in result

@pytest.mark.asyncio
async def test_my_tool_failure():
    tool = MyTool(ui_tester_instance=mock_tester)
    result = await tool._arun(param="invalid")
    assert "[FAILURE]" in result

Integration Test (config.yaml):

Important: In AI mode (type: ai), test steps are NOT defined in YAML. The LLM automatically generates test steps based on business_objectives.

test_config:
  business_objectives: "Test functionality using my_tool"
  dynamic_step_generation:
    enabled: true
    max_dynamic_steps: 8
  custom_tools:
    enabled: []  # Your custom tool will be auto-discovered

Run: webqa-agent run -c config.yaml

Code Quality Standards

# Format code
black webqa_agent/tools/custom/my_tool.py

# Sort imports
isort webqa_agent/tools/custom/my_tool.py

# Lint (must pass)
flake8 webqa_agent/tools/custom/my_tool.py

EXECUTION FLOW

How Tools Are Executed

Registration: @register_tool decorator registers tool in ToolRegistry singleton
Discovery: LangGraph workflow queries registry for available tools
Selection: LLM selects tool based on metadata descriptions
Instantiation: Registry creates tool instance with required params
Execution: Workflow calls tool._arun(**params)
Response Parsing: execute_agent parses response tag
Control Flow:
- [SUCCESS] → Continue to next step
- [FAILURE] → Trigger adaptive recovery (if enabled)
- [CRITICAL_ERROR:TYPE] → Abort test, save results
- [WARNING] → Log and continue
- [CANNOT_VERIFY] → Skip verification, continue

Adaptive Recovery (when dynamic_step_generation enabled)

ELEMENT_NOT_FOUND: Two-layer recovery (retry + LLM replanning)
Other FAILURE: LLM-driven recovery (GoBack, timeout adjustment, alternative action)
Loop Detection: Aborts if same error pattern repeats 2+ times

COMMON USE CASES

1. API Testing Tool

import aiohttp
import jsonschema
from typing import Dict, Type
from pydantic import BaseModel, Field

class APIValidatorSchema(BaseModel):
    endpoint: str = Field(description="API endpoint to validate")
    expected_schema: Dict = Field(description="JSON schema to validate against")

@register_tool
class APIValidatorTool(WebQABaseTool):
    """Validates API responses against JSON schema."""

    name: str = "validate_api_response"
    description: str = "Validates API responses against JSON schema"
    args_schema: Type[BaseModel] = APIValidatorSchema

    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls):
        return WebQAToolMetadata(
            name="validate_api_response",
            category="custom",
            step_type="validate_api",
            description_short="Validates API responses against JSON schema",
            priority=60,
            dependencies=["aiohttp", "jsonschema"]
        )

    async def _arun(self, endpoint: str, expected_schema: Dict) -> str:
        try:
            async with aiohttp.ClientSession() as session:
                async with session.get(endpoint) as resp:
                    if resp.status != 200:
                        return self.format_failure(
                            f"API returned {resp.status}",
                            recovery_hints=["Check endpoint URL", "Verify authentication"]
                        )
                    data = await resp.json()

            jsonschema.validate(instance=data, schema=expected_schema)
            return self.format_success(f"API validation passed for {endpoint}")

        except jsonschema.ValidationError as e:
            return self.format_failure(f"Schema validation failed: {e.message}")
        except Exception as e:
            return self.format_critical_error("NETWORK_ERROR", str(e))

2. Screenshot Comparison Tool

from PIL import Image
import imagehash
import io

@register_tool
class ScreenshotCompareTool(WebQABaseTool):
    """Compares current page screenshot with baseline."""

    async def _arun(self, baseline_path: str, threshold: float = 0.95) -> str:
        page = await self.ui_tester_instance.get_current_page()

        current_screenshot = await page.screenshot()
        current_image = Image.open(io.BytesIO(current_screenshot))
        baseline_image = Image.open(baseline_path)

        current_hash = imagehash.average_hash(current_image)
        baseline_hash = imagehash.average_hash(baseline_image)
        similarity = 1 - (current_hash - baseline_hash) / len(current_hash.hash) ** 2

        if similarity >= threshold:
            return self.format_success(f"Screenshot match: {similarity:.2%}")
        else:
            return self.format_failure(
                f"Screenshot mismatch: {similarity:.2%} (threshold: {threshold:.2%})",
                recovery_hints=["Update baseline if UI changed intentionally"]
            )

3. Page Title Checker

import re

@register_tool
class TitleCheckerTool(WebQABaseTool):
    """Validates page title against expected pattern."""

    async def _arun(self, expected_title: str, case_sensitive: bool = False) -> str:
        try:
            page = await self.ui_tester_instance.get_current_page()
            actual_title = await page.title()

            flags = 0 if case_sensitive else re.IGNORECASE
            if re.search(expected_title, actual_title, flags):
                return self.format_success(f"Title matches: '{actual_title}'")
            else:
                return self.format_failure(
                    f"Title mismatch. Expected: '{expected_title}', Actual: '{actual_title}'",
                    recovery_hints=["Check pattern", "Wait for dynamic title"]
                )
        except Exception as e:
            return self.format_critical_error("PAGE_CRASHED", str(e))

4. Form Auto-Fill Tool

from typing import Dict

class FormAutoFillSchema(BaseModel):
    form_data: Dict[str, str] = Field(description="Field name to value mapping")

@register_tool
class FormAutoFillTool(WebQABaseTool):
    """Auto-fills form fields from structured data."""

    name: str = "auto_fill_form"
    description: str = "Automatically fills form fields with provided data"
    args_schema: Type[BaseModel] = FormAutoFillSchema

    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls):
        return WebQAToolMetadata(
            name="auto_fill_form",
            category="custom",
            step_type="auto_fill_form",
            description_short="Auto-fills form fields from structured data",
            examples=[
                '{{"action": "auto_fill_form", "params": {{"form_data": {{"username": "test", "email": "test@example.com"}}}}}}'
            ],
            use_when=["Testing registration forms", "Filling multi-field forms", "E2E testing with form submissions"],
            priority=60,
        )

    async def _arun(self, form_data: Dict[str, str]) -> str:
        """Fill form fields based on field names or IDs."""
        try:
            page = await self.ui_tester_instance.get_current_page()
            filled_fields = []

            for field_name, value in form_data.items():
                # Try multiple selectors: name, id, placeholder
                selectors = [
                    f'[name="{field_name}"]',
                    f'#{field_name}',
                    f'[placeholder*="{field_name}" i]',
                    f'input[type="text"]:has-text("{field_name}")',
                ]

                field_filled = False
                for selector in selectors:
                    try:
                        field = await page.locator(selector).first
                        if await field.is_visible():
                            await field.fill(value)
                            filled_fields.append(field_name)
                            field_filled = True
                            break
                    except:
                        continue

                if not field_filled:
                    return self.format_failure(
                        f"Could not find field: {field_name}",
                        recovery_hints=[
                            "Check field name/ID spelling",
                            "Ensure form is visible on page",
                            f"Try manual selector for '{field_name}'"
                        ]
                    )

            # Update context for verification tools
            self.update_action_context(
                self.ui_tester_instance,
                {
                    'description': 'Auto-filled form fields',
                    'action_type': 'FormAutoFill',
                    'status': 'success',
                    'result': {
                        'fields_filled': filled_fields,
                        'total_fields': len(form_data)
                    },
                    'timestamp': __import__('datetime').datetime.now().isoformat(),
                }
            )

            return self.format_success(f"Successfully filled {len(filled_fields)} form fields: {', '.join(filled_fields)}")

        except Exception as e:
            return self.format_critical_error("VALIDATION_ERROR", f"Form fill failed: {str(e)}")

5. Database Query Validation Tool

import asyncpg  # PostgreSQL example
from typing import List, Dict, Any

class DBQuerySchema(BaseModel):
    query: str = Field(description="SQL query to execute")
    expected_row_count: int = Field(default=None, description="Expected number of rows")
    expected_columns: List[str] = Field(default=None, description="Expected column names")

@register_tool
class DBQueryValidatorTool(WebQABaseTool):
    """Validates database queries for integration testing."""

    name: str = "validate_db_query"
    description: str = "Executes and validates database queries"
    args_schema: Type[BaseModel] = DBQuerySchema

    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls):
        return WebQAToolMetadata(
            name="validate_db_query",
            category="custom",
            step_type="validate_db_query",
            description_short="Validates database queries and results",
            examples=[
                '{{"action": "validate_db_query", "params": {{"query": "SELECT * FROM users WHERE active=true", "expected_row_count": 5}}}}'
            ],
            use_when=[
                "Verifying data persistence after form submission",
                "Testing database-backed features",
                "E2E testing with database state validation"
            ],
            dont_use_when=[
                "Production databases (use test databases only)",
                "Modifying data (SELECT queries only for safety)"
            ],
            priority=55,
            dependencies=["asyncpg"]  # Or psycopg2, mysql-connector, etc.
        )

    async def _arun(
        self,
        query: str,
        expected_row_count: int = None,
        expected_columns: List[str] = None
    ) -> str:
        """Execute query and validate results (read-only for safety)."""
        try:
            # Enforce read-only queries for safety
            if not query.strip().upper().startswith('SELECT'):
                return self.format_critical_error(
                    "VALIDATION_ERROR",
                    "Only SELECT queries are allowed for safety. Use database migration tools for modifications."
                )

            # Connect to test database (from config or env vars)
            db_url = self.ui_tester_instance.config.get('test_db_url')
            if not db_url:
                return self.format_cannot_verify(
                    "Database query validation",
                    "test_db_url not configured"
                )

            conn = await asyncpg.connect(db_url)
            try:
                rows = await conn.fetch(query)
                result = [dict(row) for row in rows]

                # Validate row count
                if expected_row_count is not None and len(result) != expected_row_count:
                    return self.format_failure(
                        f"Row count mismatch. Expected: {expected_row_count}, Actual: {len(result)}",
                        recovery_hints=[
                            "Check query WHERE conditions",
                            "Verify test data setup",
                            f"Current result: {result[:3]}..."  # Show first 3 rows
                        ]
                    )

                # Validate columns
                if expected_columns and result:
                    actual_columns = set(result[0].keys())
                    expected_set = set(expected_columns)
                    if actual_columns != expected_set:
                        missing = expected_set - actual_columns
                        extra = actual_columns - expected_set
                        return self.format_failure(
                            f"Column mismatch. Missing: {missing}, Extra: {extra}",
                            recovery_hints=["Check query SELECT clause", "Verify table schema"]
                        )

                return self.format_success(
                    f"Query validated: {len(result)} rows, columns: {list(result[0].keys()) if result else []}"
                )

            finally:
                await conn.close()

        except Exception as e:
            return self.format_critical_error("NETWORK_ERROR", f"Database query failed: {str(e)}")

METADATA BEST PRACTICES

Priority Guidelines

Core System Tools: 70-90 (e.g., Tap, Input, Scroll)
High-Value Custom Tools: 55-65 (e.g., API validators, specialized assertions)
General Custom Tools: 40-55 (e.g., utility tools, helpers)
Experimental Tools: 30-40 (e.g., beta features, edge cases)

Writing Good Descriptions

description_short (one line for LLM tool selection):

✅ Good: "Validates API responses against JSON schema"
❌ Bad: "A tool that can be used to validate APIs"

description_long (detailed explanation):

Include: What it does, when to use it, key parameters, output format
Example: "Checks if the current page title matches the expected pattern. Supports regex for flexible matching. Returns [SUCCESS] on match, [FAILURE] with recovery hints on mismatch."

examples (JSON format for LLM):

examples=[
    '{{"action": "check_page_title", "params": {{"expected_title": "Dashboard"}}}}',
    '{{"action": "check_page_title", "params": {{"expected_title": "Product.*", "case_sensitive": true}}}}'
]

use_when (positive hints for LLM):

use_when=[
    "After navigation to verify correct page loaded",
    "During form submission to check redirect success",
    "In SPAs to confirm route changes"
]

dont_use_when (negative hints to prevent misuse):

dont_use_when=[
    "For content verification (use assertions instead)",
    "When title is dynamic/unpredictable"
]

REFERENCE: Base Classes API

WebQABaseTool Methods

Response Formatting

format_success(message: str, **context) -> str

Returns: "[SUCCESS] {message}"
Optional context: dom_diff, page_state

format_failure(message: str, recovery_hints: List[str] = None) -> str

Returns: "[FAILURE] {message}"
Triggers adaptive recovery when enabled

format_critical_error(error_type: str, message: str) -> str

Returns: "[CRITICAL_ERROR:{error_type}] {message}"
Causes immediate test abort
Valid error_type values: ELEMENT_NOT_FOUND, NAVIGATION_FAILED, PERMISSION_DENIED, PAGE_CRASHED, NETWORK_ERROR, SESSION_EXPIRED, UNSUPPORTED_PAGE, VALIDATION_ERROR

format_warning(message: str) -> str

Returns: "[WARNING] {message}"
Non-blocking issue logging

format_cannot_verify(message: str, reason: str) -> str

Returns: "[CANNOT_VERIFY] {message}. Reason: {reason}"
Verification prerequisite failed

Context Management

update_action_context(ui_tester: Any, context: Dict[str, Any]) -> None

Updates ui_tester.last_action_context for downstream tools
Recommended for action category tools

get_execution_context(ui_tester: Any) -> Optional[Dict[str, Any]]

Returns context from previous actions
Used by assertion tools for context-aware verification

WebQAToolMetadata Fields

WebQAToolMetadata(
    name="tool_name_snake",           # Required: Tool identifier
    category="custom",                # action, assertion, ux, custom
    step_type="custom_tool_name",    # For planning prompts
    description_short="One line",     # Brief description
    description_long="Detailed",      # Full description with examples
    examples=["JSON example"],        # Usage examples
    use_when=["scenario 1"],          # When to use hints
    dont_use_when=["anti-pattern"],   # When NOT to use
    priority=55,                      # 1-100 priority
    dependencies=["package"],         # Python package dependencies
)

CODEBASE REFERENCE

Key Files to Reference

Base Classes: webqa_agent/tools/base.py:1-519

WebQABaseTool, WebQAToolMetadata, ResponseTags, ActionTypes

Registry: webqa_agent/tools/registry.py

Singleton pattern, auto-discovery, dependency checking

Element Actions: webqa_agent/tools/action_tool.py

Reference for browser interaction patterns

Custom Tool Example: webqa_agent/tools/custom/link_check_tool.py

Real-world custom tool implementation

Import Patterns

# Standard imports for all tools
from typing import Any, Type, Dict, List, Optional
from pydantic import BaseModel, Field
from webqa_agent.tools.base import (
    WebQABaseTool,
    WebQAToolMetadata,
    ResponseTags,
)
from webqa_agent.tools.registry import register_tool

# Browser interactions
from playwright.async_api import Page, Error as PlaywrightError

# JSON handling
import json
from datetime import datetime

# Logging
import logging
logger = logging.getLogger(__name__)

FINAL CHECKLIST

Before completing tool development, verify:

CONTEXT COMPLETENESS NOTE

This document provides comprehensive context for LLM-assisted custom tool development. When generating tools:

Reference Templates: Use exact code templates provided above
Follow Constraints: Adhere to ALL mandatory requirements
Avoid Anti-Patterns: Check error patterns section before coding
Test Thoroughly: Follow verification checklist
Ask Questions: If requirements are unclear, ask user for clarification

Goal: Generate correctly implemented, production-ready custom tools that integrate seamlessly with WebQA Agent's LangGraph workflow without hallucinations or common errors.

END OF LLM CONTEXT DOCUMENT

FilesExpand file tree

CUSTOM_TOOL_DEVELOPMENT_AI.md

Latest commit

History

CUSTOM_TOOL_DEVELOPMENT_AI.md

File metadata and controls

WebQA Agent Custom Tool Development - LLM Context Document

DOCUMENT TYPE: LLM Context / System Prompt AUDIENCE: Large Language Models (Claude, GPT-4, Gemini, etc.) PURPOSE: Provide complete project context for AI-assisted custom tool development VERSION: 0.1.0 LAST_UPDATED: 2025-12-31

PROJECT CONTEXT

What is WebQA Agent?

System Architecture Overview

Critical Files

Project Directory Structure

MANDATORY CONSTRAINTS

Hard Requirements (MUST follow)

Response Tag Types

File Location Rules

Naming Conventions (STRICT)

CODE TEMPLATES

Minimal Working Tool Template

Advanced Features Template

Context Management

case_recorder Integration (for HTML reports)

Accessing LLM Config and Other Context

ERROR PATTERNS TO AVOID

❌ WRONG: Missing Response Tag

❌ WRONG: Using Sync Method

❌ WRONG: Not Using @register_tool

❌ WRONG: JSON Serialization Error

❌ WRONG: Undeclared Dependencies

❌ WRONG: Not Updating Context (for action tools)

TESTING REQUIREMENTS

Verification Checklist

Code Quality Standards

EXECUTION FLOW

How Tools Are Executed

Adaptive Recovery (when dynamic_step_generation enabled)

COMMON USE CASES

1. API Testing Tool

2. Screenshot Comparison Tool

3. Page Title Checker

4. Form Auto-Fill Tool

5. Database Query Validation Tool

METADATA BEST PRACTICES

Priority Guidelines

Writing Good Descriptions

REFERENCE: Base Classes API

WebQABaseTool Methods

Response Formatting

Context Management

WebQAToolMetadata Fields

CODEBASE REFERENCE

Key Files to Reference

Import Patterns

FINAL CHECKLIST

CONTEXT COMPLETENESS NOTE