Stop writing brittle CSS-selector tests by hand. Tag your elements once — let Claude generate, run, and analyse your entire test suite.
UI tests break constantly because they depend on selectors like div.container > form > input:nth-child(2). One refactor and 40 tests fail.
PhantomUI replaces fragile selectors with semantic HTML attributes (data-ai-id, data-ai-role, data-ai-label) that describe what an element is, not where it sits in the DOM. The MCP server exposes 9 tools that Claude uses to snapshot your UI, generate Playwright scenarios from natural language, execute them, and report results — with no test code to write or maintain.
1. Add data-ai-* attributes to your HTML (or let auto-tagging handle it)
2. SDK serialises your UI into a structured JSON snapshot
3. PhantomUI MCP server connects to Claude Desktop / Claude Code
4. Claude calls get_ui_snapshot → run_test → save_report automatically
5. You get passing/failing results, HTML dashboards, and JUnit XML
| Feature | Detail |
|---|---|
| Zero dependencies | One <script> tag, no build step, no bundler |
| Manual tagging | data-ai-id, data-ai-role, data-ai-label, data-ai-context, data-ai-state, data-ai-required |
| Auto-tagging | Heuristic discovery of inputs, buttons, links, headings — no annotations needed |
| Duplicate ID warnings | Scanner warns when two elements share the same data-ai-id |
| Framework adapters | React, Vue, Angular helpers included |
| Production self-disable | SDK is a no-op when NODE_ENV=production |
| Size | < 5 KB minified |
| Feature | Detail |
|---|---|
| 9 MCP tools | Usable from Claude Desktop, Claude Code, or any MCP client |
| Playwright execution | Real Chromium browser — click, fill, hover, keyboard, scroll, check, assert |
| Session injection | Inject cookies, localStorage, and Bearer auth tokens per test run |
| Network mocking | Intercept and stub any URL pattern with custom response body and status |
| State tracking | Reads data-ai-state before and after each interactive step |
| Coverage tracking | Tracks which snapshot elements were exercised; shown in reports |
| Screenshot on failure | Base64 PNG captured on every failed step, embedded in report |
| Parallel runner | Run up to 10 scenarios concurrently; configurable batch size |
| Retry failed steps | Re-run any previous run with optional per-step selector/value/timeout overrides |
| UI snapshot diff | Compare any two snapshots — shows added, removed, changed elements |
| SDK auto-injection | If a page doesn't load the SDK, the server injects the bundle automatically |
| Persistent result store | Disk-backed, LRU-evicted (100 runs max), optional webhook on every result |
| Multiple LLM backends | Anthropic, Ollama, OpenAI-compatible — auto-detected from env vars |
| Format | Output |
|---|---|
| JSON | Machine-readable, full step details, CI-friendly |
| HTML | Self-contained dashboard — coverage card, per-step table, inline failure screenshots, snapshot warnings |
| JUnit XML | Plug into GitHub Actions, Jenkins, GitLab CI; includes <property name="coverage"/> per suite |
<script src="https://unpkg.com/@phantomui/sdk/dist/ai-sdk.js"></script>npm install @phantomui/sdk# 1. Clone and build
git clone https://github.com/ibrahim-abi/phantomui
cd phantomui/server && npm install && npm run build
# 2. Connect to Claude Code
claude mcp add --transport stdio phantomui node /absolute/path/to/phantomui/server/dist/index.js
# 3. Or run as a standalone HTTP server
node dist/index.js --port 3100Step 1 — Tag your HTML
<input
id="email"
data-ai-id="login-email"
data-ai-role="input"
data-ai-label="Email Address"
data-ai-context="login-form"
data-ai-required="true"
/>
<button
data-ai-id="login-submit"
data-ai-role="action"
data-ai-action="submit"
data-ai-context="login-form"
>Sign In</button>
<script src="https://unpkg.com/@phantomui/sdk/dist/ai-sdk.js"></script>Don't want to annotate your HTML? Auto-tagging discovers inputs, buttons, and links automatically.
Step 2 — Build and connect the MCP server
cd phantomui/server && npm install && npm run build
claude mcp add --transport stdio phantomui node /absolute/path/to/phantomui/server/dist/index.jsThen start your frontend app (npm run dev) so it's reachable at a local URL.
Step 3 — Ask Claude
Test the login form at http://localhost:3000.
Cover the happy path, invalid email, and empty password cases.
Save an HTML report when done.
Claude calls get_ui_snapshot → run_test → save_report automatically. You get a full report with pass/fail per step, screenshots on failure, and a coverage summary.
| Tool | What it does |
|---|---|
get_ui_snapshot |
Navigates to a URL, injects the SDK if needed, returns a structured element map |
list_elements |
Lists snapshot elements with optional role or source filter |
generate_tests |
CI/headless mode — navigates to a URL, captures snapshot, calls configured LLM to auto-generate scenarios |
run_test |
Runs one scenario step-by-step with Playwright; returns a runId |
run_tests_parallel |
Runs multiple scenarios concurrently (1–10 at once); returns batch summary with all runIds |
get_results |
Returns full step-by-step details, errors, durations, and failure screenshots for a runId |
retry_failed |
Re-runs failed steps from a prior run; supports per-step target, value, timeout overrides |
diff_snapshots |
Compares two snapshots; returns added, removed, and changed elements with field-level diff |
save_report |
Writes results to disk as json, html, or junit; omit run_id to include all stored runs |
run_test supports: navigate · fill · click · select · assert · wait
run_tests_parallel supports all of the above plus: hover · keyboard · scroll · check
Pass a session object to run_test to inject auth state before the scenario runs:
{
"scenario": { "name": "...", "steps": [] },
"session": {
"cookies": [{ "name": "token", "value": "abc", "domain": "localhost" }],
"localStorage": { "theme": "dark" },
"auth_token": "Bearer eyJ..."
}
}Pass networkMocks to run_tests_parallel to intercept and stub API calls:
{
"scenarios": [{
"name": "checkout with mocked payment",
"steps": [{ "action": "navigate", "target": "http://localhost:3000/checkout" }],
"networkMocks": [
{ "urlPattern": "**/api/payment", "status": 200, "body": { "status": "approved" } }
]
}]
}| Environment variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER |
auto-detect | anthropic · ollama · openai-compatible |
ANTHROPIC_API_KEY |
— | Required for Anthropic provider |
AI_MODEL |
claude-sonnet-4-6 |
Model override for Anthropic |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3.1 |
Ollama model name |
OPENAI_COMPATIBLE_BASE_URL |
— | Base URL for OpenAI-compatible endpoints |
OPENAI_COMPATIBLE_MODEL |
gpt-4o |
Model for OpenAI-compatible provider |
RESULT_STORE_PATH |
~/.phantomui/runs |
Directory for persisted test results |
WEBHOOK_URL |
— | POST each test result JSON here after every run |
HEADLESS |
true |
Set to false to see the browser during test runs |
HTML dashboard — open in any browser, no server needed
Summary: Total 4 · Passed 3 · Failed 1 · Duration 2.4s · Coverage 3/5 (60%)
⚠ Snapshot Quality Warnings
• [ai-sdk] Duplicate data-ai-id "submit-btn" — each ID must be unique.
Login flow [PASSED] 1.23s ✓ 3 ✗ 0 — 0
1 navigate https://... ✓ 412ms
2 fill #email ✓ 198ms
3 click #submit ✓ 624ms
Failed steps include an inline screenshot — no separate file needed.
JUnit XML — compatible with GitHub Actions test summary, Jenkins, GitLab CI
<testsuite name="Login flow" tests="3" failures="0" time="1.234">
<properties><property name="coverage" value="3/5"/></properties>
<testcase name="navigate http://localhost:3000/login" time="0.412"/>
<testcase name="fill #email" time="0.198"/>
<testcase name="click #submit" time="0.624"/>
</testsuite>phantomui/
├── sdk/ # @phantomui/sdk — zero-dependency browser SDK
│ └── src/
│ ├── scanner.js # Manual data-ai-* element discovery + duplicate warnings
│ ├── autotagger.js # Heuristic discovery for untagged elements + buildSelector
│ ├── serializer.js # Snapshot serialisation (SDK ↔ server contract)
│ ├── attributes.js # Canonical attribute name constants
│ └── adapters/ # React / Vue / Angular helpers
├── server/ # @phantomui/server — MCP server + HTTP API
│ └── src/
│ ├── tools/ # 9 MCP tool implementations
│ ├── runner/
│ │ ├── executor.ts # Playwright runner — all step actions, state/coverage tracking
│ │ ├── session.ts # Cookie, localStorage, auth header injection
│ │ └── store.ts # Disk-backed result store with LRU eviction + webhook
│ ├── ai/
│ │ ├── llm.ts # Unified LLM interface (Anthropic / Ollama / OpenAI-compatible)
│ │ └── generator.ts # Test scenario generation prompt + response parser
│ ├── reports/ # JSON / HTML / JUnit XML generators
│ ├── schema/ # Zod snapshot validation + SDK version compatibility check
│ └── tests/ # Test suite (Playwright integration + unit)
├── examples/ # HTML fixtures
│ ├── login-form.html # Manual data-ai-* tagging
│ ├── checkout-form.html # Multi-section form with payment fields
│ ├── auto-tag-demo.html # Auto-tagging with zero annotations
│ └── interactive.html # Hover, keyboard, scroll, checkbox fixture
└── docs/ # Guides and API reference
| Guide | Description |
|---|---|
| Getting Started | Full setup walkthrough — zero to first test run |
| SDK Reference | data-ai-* attributes, snapshot shape, auto-tagging |
| MCP Tools | All 9 tools — inputs, outputs, and examples |
| Reports | Report formats, CI integration, HTML dashboard |
See CONTRIBUTING.md for development setup, commit conventions, and contribution guidelines.
MIT — Muhammad Ibrahim, 2026