Amir IbnuEyni

Hey, I'm Amir Ahmedin 👋

I build AI agent systems — from evaluation benchmarks to production pipelines that enrich, qualify, and convert leads autonomously.

🔭 Currently: Building domain-specific benchmarks and multi-tool agent orchestration systems
🧠 Focus: Agent evaluation, data pipelines, constrained tool-use, and context engineering
💬 Ask me about: AI agent architectures, sales automation, document intelligence, data lineage

🏗️ Featured Projects

Oracle Forge

Natural language data analytics agent evaluated on DataAgentBench (54 queries, 12 datasets, 9 domains). Uses a 3-layer knowledge base injection architecture to achieve 35.2% pass@1 against a 54.3% SOTA ceiling.

Tenacious Sales Bench

Domain-specific benchmark for B2B sales agents — 250 tasks across signal grounding, tone consistency, resource honesty, and workflow correctness. Published on HuggingFace with a SimPO-trained judge model.

Conversion Engine

Automated lead generation system with 5-signal enrichment pipeline (Crunchbase, job posts, layoffs, leadership changes, AI maturity), ICP classification, multi-channel outreach, and CRM sync.

Brownfield Cartographer

Codebase intelligence system that transforms undocumented repos into queryable knowledge graphs — module dependency analysis, data lineage tracking, blast radius calculation, and LLM-powered semantic analysis.

Document Intelligence Refinery

Enterprise-grade agentic pipeline for unstructured document extraction. Multi-strategy routing (fast text → layout-aware → vision-augmented) with confidence-gated escalation and spatial provenance.

Data Contract Enforcer

Schema integrity and lineage attribution system — auto-generates Bitol-compatible contracts, validates data snapshots, traces violations to upstream git commits, and detects schema drift.

⚡ Tech Stack

📊 What I'm Working On

Agent evaluation methodology — building benchmarks that catch real production failures
Centralized orchestration patterns for multi-tool agent systems
Context engineering for data agents (knowledge base injection, schema hints, corrections memory)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly