Custom Tool Development

Quick reference for creating custom WebQA Agent tools.

Quick Start

Minimal working example:

from typing import Any, Type
from pydantic import BaseModel, Field
from webqa_agent.tools.base import (
    WebQABaseTool,
    WebQAToolMetadata,
)
from webqa_agent.tools.registry import register_tool

class HelloToolSchema(BaseModel):
    message: str = Field(description="Message to display")

@register_tool
class HelloTool(WebQABaseTool):
    name: str = "hello_world"
    description: str = "Prints a greeting message"
    args_schema: Type[BaseModel] = HelloToolSchema

    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls):
        return WebQAToolMetadata(
            name="hello_world",
            category="custom",
            description_short="Simple greeting tool",
        )

    async def _arun(self, message: str) -> str:
        return self.format_success(f"Hello, {message}!")

Test: webqa-agent gen -c config.yaml

Core Components

Base Classes:

WebQABaseTool - Base class for all tools
WebQAToolMetadata - Tool metadata
ResponseTags - Response tags (SUCCESS, FAILURE, CRITICAL_ERROR:TYPE)
ToolRegistry - Tool registration system

Required Methods:

get_metadata() - Tool metadata (classmethod)
_arun() - Async execution (MUST be async, NOT sync _run)

Response Helpers:

format_success(msg) - Return success
format_failure(msg, hints) - Recoverable error
format_critical_error(type, msg) - Abort test
format_warning(msg) - Non-blocking issue
format_cannot_verify(msg, reason) - Verification failed

File Structure

webqa_agent/tools/
├── base.py              # Base classes
├── registry.py          # Registration system
├── custom/              # Your tools here
│   └── my_tool.py
└── __init__.py

tests/custom_tools/
└── test_my_tool.py      # Your tests

API Reference

WebQABaseTool

from webqa_agent.tools.base import WebQABaseTool

class MyTool(WebQABaseTool):
    name: str = "my_tool"
    description: str = "Tool description"
    args_schema: Type[BaseModel] = MyToolSchema
    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls) -> WebQAToolMetadata:
        return WebQAToolMetadata(...)

    async def _arun(self, **kwargs) -> str:
        return self.format_success("Done")

WebQAToolMetadata

Tool metadata controls how your tool appears in LLM prompts and how it's registered.

Field Reference

Field	Type	Required	Default	Description
`name`	str	Yes	-	Unique tool identifier used by LangChain
`category`	str	No	`"custom"`	Tool category: `action`, `assertion`, `ux`, `custom`
`step_type`	str	No	`None`	Step type for planning docs and logs
`description_short`	str	No	`""`	One-line description shown in prompts
`description_long`	str	No	`""`	Detailed description with parameters
`examples`	List[str]	No	`[]`	JSON examples for LLM context
`use_when`	List[str]	No	`[]`	Hints for when to use this tool
`dont_use_when`	List[str]	No	`[]`	Hints for when NOT to use
`priority`	int	No	`50`	Priority 1-100 (higher = preferred)
`dependencies`	List[str]	No	`[]`	Required Python packages

Field Details

name (Required)

Must be unique across all tools
Use snake_case: check_page_title, detect_dynamic_links
This is the function name LangChain uses

category

action: Browser interactions (click, input, scroll)
assertion: Verification and validation
ux: User experience testing
custom: User-defined tools (default)

step_type

Used in planning documentation and execution logs
For custom tools, use custom_xxx format
If None, tool appears by name only in planning prompts

description_short

One-line summary shown in LLM prompts
Keep under 80 characters
Example: "Validates page title against regex pattern"

description_long

Detailed description with feature list, parameter explanations, usage notes
Supports multi-line strings with \n

examples

JSON strings showing tool invocation
LLM uses these to understand correct syntax
Include 2-3 examples covering common use cases

use_when

List of scenarios where this tool is appropriate
Helps LLM decide when to select your tool
Be specific: "After clicking navigation menus"

dont_use_when

Scenarios where tool should NOT be used
Prevents misuse by LLM
Example: "For static pages without JavaScript"

priority

Range: 1-100 (higher = preferred by agent)
Core tools: 70-90
Custom tools: 30-60 recommended
Default: 50

dependencies

Python packages required by your tool
Used for dependency checking
Example: ["aiohttp", "beautifulsoup4"]

Complete Example

Based on link_check_tool.py:

@classmethod
def get_metadata(cls) -> WebQAToolMetadata:
    return WebQAToolMetadata(
        name='detect_dynamic_links',
        category='custom',
        step_type='detect_dynamic_links',
        description_short='Detects new links appearing after user interactions',
        description_long=(
            'Identifies and validates new links that appear dynamically after '
            'user interactions such as clicking navigation menus or forms.\n\n'
            'Features:\n'
            '  - Tracks link history to identify new links\n'
            '  - HTTPS certificate validation\n'
            '  - HTTP status code checking\n\n'
            'Parameters:\n'
            '  - check_https: Validate HTTPS (default: True)\n'
            '  - check_status: Check HTTP status (default: True)\n'
            '  - timeout: Request timeout in seconds (default: 10)'
        ),
        examples=[
            '{"action": "detect_dynamic_links", "params": {"check_https": true}}',
            '{"action": "detect_dynamic_links", "params": {}}',
        ],
        use_when=[
            'After clicking navigation menus or dropdowns',
            'In Single Page Applications (SPAs)',
            'When testing dynamic content loading',
        ],
        dont_use_when=[
            'On static pages without JavaScript',
            'When only checking visual elements',
        ],
        priority=45,
        dependencies=[],
    )

ResponseTags

Success/Failure:

[SUCCESS] - Continue to next step
[FAILURE] - Trigger adaptive recovery (if enabled)
[WARNING] - Non-blocking issue
[CANNOT_VERIFY] - Verification prerequisite failed

Critical Errors (abort test immediately):

[CRITICAL_ERROR:ELEMENT_NOT_FOUND]
[CRITICAL_ERROR:NAVIGATION_FAILED]
[CRITICAL_ERROR:PERMISSION_DENIED]
[CRITICAL_ERROR:PAGE_CRASHED]
[CRITICAL_ERROR:NETWORK_ERROR]
[CRITICAL_ERROR:SESSION_EXPIRED]
[CRITICAL_ERROR:UNSUPPORTED_PAGE]
[CRITICAL_ERROR:VALIDATION_ERROR]

Common Mistakes

Forgot ResponseTag: Must use format_success/failure/critical_error

# Wrong
return "Operation completed"

# Correct
return self.format_success("Operation completed")

Sync method: Use async def _arun, NOT def _run

# Wrong
def _run(self, param: str):
    return self.format_success("Done")

# Correct
async def _arun(self, param: str):
    return self.format_success("Done")

No @register_tool: Tool won't be discovered

# Wrong
class MyTool(WebQABaseTool):
    ...

# Correct
@register_tool
class MyTool(WebQABaseTool):
    ...

Missing dependencies: Declare in get_metadata().dependencies
```
dependencies=["aiohttp", "beautifulsoup4"]
```

JSON serialization: Convert exceptions to strings for case_recorder

model_io=json.dumps({'error': str(exception)}, ensure_ascii=False)

Context Management

Update context (for action tools):

async def _arun(self, param: str) -> str:
    from datetime import datetime
    result = await self._execute(param)

    self.update_action_context(
        self.ui_tester_instance,
        {
            'description': 'Executed action',
            'action_type': 'MyAction',
            'status': 'success',
            'result': result,
            'timestamp': datetime.now().isoformat(),
        }
    )

    return self.format_success("Done")

Read context (for assertion tools):

async def _arun(self, ...) -> str:
    context = self.get_execution_context(self.ui_tester_instance)
    if context:
        previous_data = context['last_action']['result']
        # Use previous_data

Advanced Features

Access LLM config:

@classmethod
def get_required_params(cls) -> Dict[str, str]:
    return {
        'ui_tester_instance': 'ui_tester_instance',
        'llm_config': 'llm_config',
        'case_recorder': 'case_recorder',
    }

async def _arun(self, param: str) -> str:
    model = self.llm_config.get('model', 'gpt-4')
    # Adjust behavior based on model

Record to HTML report:

if self.case_recorder:
    self.case_recorder.add_step(
        description="Custom operation",
        model_io=json.dumps({'input': param, 'output': result}, ensure_ascii=False),
        status='passed',
        step_type='action',
    )

Verification

# Check registration
python -c "from webqa_agent.tools.registry import get_registry; print('my_tool' in get_registry().get_tool_names())"

# Run tests
pytest tests/custom_tools/test_my_tool.py -v

# Format & lint
black webqa_agent/ && isort webqa_agent/ && flake8 webqa_agent/tools/custom/my_tool.py

Configuration Example

To use your custom tool:

Place your tool in webqa_agent/tools/custom/
Decorate with @register_tool - the LLM will automatically discover it
Configure your test with business objectives

Important: In AI mode (type: ai), test steps are NOT defined in YAML. The LLM automatically generates test steps based on business_objectives and selects tools based on their descriptions and metadata.

# config/config.yaml
target:
  url: https://example.com
  description: Test custom functionality

# Test Configuration - NO test_steps in AI mode!
test_config:
  business_objectives: "Test custom functionality using my_tool"
  dynamic_step_generation:
    enabled: true  # Enable adaptive recovery
    max_dynamic_steps: 8
    min_elements_threshold: 2
  custom_tools:
    enabled: []  # Your custom tool will be auto-discovered

# LLM Configuration
llm_config:
  model: claude-sonnet-4-5-20250929  # Or gpt-4, gemini-2.5-flash-lite
  api_key: ${ANTHROPIC_API_KEY}  # Use environment variable
  temperature: 1.0  # Required for Claude Extended Thinking
  max_tokens: 20000  # Must be larger than reasoning.budget_tokens

# Browser Configuration
browser_config:
  headless: false  # Set to true for CI/CD
  viewport: {width: 1280, height: 720}
  language: en-US

How the LLM selects your tool:

LLM reads tool description and get_metadata() output
Chooses tools based on use_when hints and current page context
Your tool is invoked when LLM determines it's appropriate for the test objective

Configuration Tips:

Use environment variables for API keys (never commit credentials)
Adjust temperature based on your model (OpenAI: 0.1, Anthropic/Gemini: 1.0)
Set headless: true in Docker/CI environments

Real-World Example

Page title checker:

import re
from typing import Any, Type
from pydantic import BaseModel, Field
from webqa_agent.tools.base import (
    WebQABaseTool,
    WebQAToolMetadata,
)
from webqa_agent.tools.registry import register_tool

class TitleCheckerSchema(BaseModel):
    expected_title: str = Field(description="Expected page title pattern (regex)")
    case_sensitive: bool = Field(default=False, description="Case-sensitive matching")

@register_tool
class TitleCheckerTool(WebQABaseTool):
    name: str = "check_page_title"
    description: str = "Validates page title"
    args_schema: Type[BaseModel] = TitleCheckerSchema
    ui_tester_instance: Any = Field(...)

    @classmethod
    def get_metadata(cls):
        return WebQAToolMetadata(
            name="check_page_title",
            category="custom",
            step_type="check_page_title",
            description_short="Validates page title against pattern",
            examples=[
                '{"action": "check_page_title", "params": {"expected_title": "Dashboard"}}',
            ],
            use_when=["After navigation", "In SPAs"],
            priority=55,
        )

    async def _arun(self, expected_title: str, case_sensitive: bool = False) -> str:
        try:
            page = await self.ui_tester_instance.get_current_page()
            actual_title = await page.title()

            flags = 0 if case_sensitive else re.IGNORECASE
            if re.search(expected_title, actual_title, flags):
                return self.format_success(f"Title matches: '{actual_title}'")
            else:
                return self.format_failure(
                    f"Title mismatch. Expected: '{expected_title}', Actual: '{actual_title}'",
                    recovery_hints=["Check pattern", "Wait for dynamic title"]
                )
        except Exception as e:
            return self.format_critical_error("PAGE_CRASHED", str(e))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Tool Development

Quick Start

Core Components

File Structure

API Reference

WebQABaseTool

WebQAToolMetadata

Field Reference

Field Details

Complete Example

ResponseTags

Common Mistakes

Context Management

Advanced Features

Verification

Configuration Example

Real-World Example

See Also

FilesExpand file tree

CUSTOM_TOOL_DEVELOPMENT.md

Latest commit

History

CUSTOM_TOOL_DEVELOPMENT.md

File metadata and controls

Custom Tool Development

Quick Start

Core Components

File Structure

API Reference

WebQABaseTool

WebQAToolMetadata

Field Reference

Field Details

Complete Example

ResponseTags

Common Mistakes

Context Management

Advanced Features

Verification

Configuration Example

Real-World Example

See Also