I build AI agent systems — from evaluation benchmarks to production pipelines that enrich, qualify, and convert leads autonomously.
🔭 Currently: Building domain-specific benchmarks and multi-tool agent orchestration systems
🧠 Focus: Agent evaluation, data pipelines, constrained tool-use, and context engineering
💬 Ask me about: AI agent architectures, sales automation, document intelligence, data lineage
Natural language data analytics agent evaluated on DataAgentBench (54 queries, 12 datasets, 9 domains). Uses a 3-layer knowledge base injection architecture to achieve 35.2% pass@1 against a 54.3% SOTA ceiling.
Domain-specific benchmark for B2B sales agents — 250 tasks across signal grounding, tone consistency, resource honesty, and workflow correctness. Published on HuggingFace with a SimPO-trained judge model.
Automated lead generation system with 5-signal enrichment pipeline (Crunchbase, job posts, layoffs, leadership changes, AI maturity), ICP classification, multi-channel outreach, and CRM sync.
Codebase intelligence system that transforms undocumented repos into queryable knowledge graphs — module dependency analysis, data lineage tracking, blast radius calculation, and LLM-powered semantic analysis.
Enterprise-grade agentic pipeline for unstructured document extraction. Multi-strategy routing (fast text → layout-aware → vision-augmented) with confidence-gated escalation and spatial provenance.
Schema integrity and lineage attribution system — auto-generates Bitol-compatible contracts, validates data snapshots, traces violations to upstream git commits, and detects schema drift.
- Agent evaluation methodology — building benchmarks that catch real production failures
- Centralized orchestration patterns for multi-tool agent systems
- Context engineering for data agents (knowledge base injection, schema hints, corrections memory)

