NVIDIA AI-Q Blueprint

⚠️ IMPORTANT – Active Development Branch

You are currently viewing the develop branch for the pre-release version of AI-Q v2.0.

This branch contains the latest features and experimental updates and may contain breaking changes.

For production use, switch to the v1.2.1 stable release on the main branch.

🏆 BENCHMARK NOTE 🏆

To obtain results consistent with the nvidia-aiq DeepResearch Bench and DeepResearch Bench II leaderboard results, please use the drb1 and drb2 branches, respectively.

Overview
Software Components
Target Audience
Prerequisites
Architecture
Getting Started
Configuration Files
Ways to Run the Agents
Evaluating the Workflow
- Available Benchmarks
- Running Evaluations
Development
Roadmap
Security Considerations
License

Overview

The NVIDIA AI-Q Blueprint is an enterprise-grade research agent built on the NVIDIA NeMo Agent Toolkit and uses LangChain Deep Agents. It gives you both quick, cited answers and in-depth, report-style research in one system, with benchmarks and evaluation harnesses so you can measure quality and improve over time.

Key features:

Orchestration node — One node classifies intent (meta vs. research), produces meta responses (for example, greetings, capabilities), and sets research depth (shallow vs. deep).
Shallow research — Bounded, faster researcher with tool-calling and source citation.
Deep research — Long-running multi-step planning and research to generate a long-form citation-backed report.
Workflow configuration — YAML configs define agents, tools, LLMs, and routing behavior so you can tune workflows without code changes.
Modular workflows — All agents (orchestration node, shallow researcher, deep researcher, clarifier) are composable; each can run standalone or as part of the full pipeline.
Evaluation harnesses — Built-in benchmarks (for example, FreshQA, DeepResearch) and evaluation scripts to measure quality and iterate on prompts and agent architecture.
Frontend options — Run through CLI, web UI, or async jobs; the Getting started and Ways to run the agents.
Deployment options - Deployment assets for a docker compose as well as helm deployment.

Software Components

The following are used by this project in the default configuration:

NVIDIA NeMo Agent Toolkit
NVIDIA nemotron-3-super-120b-a12b (agents)
NVIDIA nemotron-3-nano-30b-a3b (agents)
GPT-OSS-120B (agents)
NVIDIA nemotron-mini-4b-instruct (document summary, if used)
NVIDIA llama-nemotron-embed-vl-1b-v2 (embedding model for llamaindex knowledge layer implementation, if used)
NVIDIA nemotron-nano-12b-v2-vl (vision-language model for llamaindex knowledge layer implementation, if used)
Tavily Search API for web search
Serper Search API for paper search (Google Scholar)

Target Audience

This project is for:

AI researchers and developers: People building or extending agentic research workflows
Enterprise teams: Organizations needing tool-augmented research with citation-backed research
NeMo Agent Toolkit users: Developers looking to understand advanced multi-agent patterns

Prerequisites

Python 3.11–3.13
uv package manager
NVIDIA API key from NVIDIA AI (for NIM models)
Node.js 22+ and npm (optional, for web UI mode)

Dependency Note: This release is pinned to NeMo Agent Toolkit (NAT) v1.4.0 (nvidia-nat==1.4.0). NAT v1.5 or later is not yet supported by AI-Q and upgrading may introduce breaking changes. The pin will be lifted in a future AI-Q release once compatibility has been validated.

Optional requirements:

Tavily API key (for web search functionality)
Serper API key (for academic paper search functionality)

Note: Configure at least one data source (Tavily web search, Serper search tool, or knowledge layer) to enable research functionality.

If these optional API keys are not provided, the agent continues to operate without the corresponding search capabilities. Refer to Obtain API Keys for details.

Hardware Requirements

When using NVIDIA API Catalog (the default), inference runs on NVIDIA-hosted infrastructure and there are no local GPU requirements. The hardware references below apply only when self-hosting models via NVIDIA NIM.

Component	Default Model	Self-Hosted Hardware Reference
LLM (research subagent)	`nvidia/nemotron-3-super-120b-a12b`	Nemotron 3 Super support matrix
LLM (intent classifier)	`nvidia/nemotron-3-nano-30b-a3b`	Nemotron 3 Nano support matrix
LLM (deep research orchestrator, planner)	`openai/gpt-oss-120b`	GPT OSS support matrix
Document summary (optional)	`nvidia/nemotron-mini-4b-instruct`	Nemotron Mini 4B
Text embedding	`nvidia/llama-nemotron-embed-vl-1b-v2`	NeMo Retriever embedding support matrix
VLM (image/chart extraction, optional)	`nvidia/nemotron-nano-12b-v2-vl`	Vision language model support matrix
Knowledge layer (Foundational RAG, optional)	--	RAG Blueprint support matrix

For detailed installation instructions, refer to Installation -- Hardware Requirements.

Architecture

AI-Q uses a LangGraph-based state machine with the following key components:

Orchestration node: Classifies intent (meta vs. research), produces meta responses when needed, and sets depth (shallow vs. deep) in one step
Shallow research agent: Bounded tool-augmented research optimized for speed
Deep research agent: Multi-phase research with planning, iteration, and citation management

Each agent can be run individually or as part of the orchestrated workflow. For detailed architecture documentation, refer to Architecture.

Getting Started

Clone the Repository

git clone https://github.com/NVIDIA-AI-Blueprints/aiq.git && cd aiq

Automated Setup

Run the setup script to initialize the environment:

./scripts/setup.sh

This script:

Creates a Python virtual environment with uv
Installs all Python dependencies (core, frontends, benchmarks, data sources)
Installs UI dependencies (if Node.js is available)

Manual Installation

For selective installation, install packages individually:

# Create and activate virtual environment
uv venv --python 3.13 .venv
source .venv/bin/activate

# Install core with development dependencies
uv pip install -e ".[dev]"

# Install frontends (pick what you need)
uv pip install -e ./frontends/cli          # CLI frontend
uv pip install -e ./frontends/debug        # Debug console
uv pip install -e ./frontends/aiq_api      # Unified API (includes debug)

# Install benchmarks (pick what you need)
uv pip install -e ./frontends/benchmarks/deepresearch_bench
uv pip install -e ./frontends/benchmarks/freshqa

# Install data sources (pick what you need)
uv pip install -e ./sources/tavily_web_search
uv pip install -e ./sources/google_scholar_paper_search
uv pip install -e "./sources/knowledge_layer[llamaindex,foundational_rag]"

Obtain API Keys

API	Environment Variable	Purpose	Required
NVIDIA API	`NVIDIA_API_KEY`	LLM inference through NIM	Yes
Tavily	`TAVILY_API_KEY`	Web search	No (if not specified, agent continues without web search)
Serper	`SERPER_API_KEY`	Academic paper search	No (if not specified, agent continues without paper search)

Obtain an NVIDIA API Key

Sign in to NVIDIA Build
Click on any model, then select "Deploy" > "Get API Key" > "Generate Key"

Obtain a Tavily API Key

Sign in to Tavily
Navigate to your dashboard
Generate an API key

Obtain a Serper API Key

Sign in to Serper
Generate an API key from your dashboard

Set Up Environment Variables

Create a .env file in deploy/ directory:

cp deploy/.env.example deploy/.env

Replace your API keys.

Note: Depending on your usecase, deep research report quality can be enhanced by enabling searching across academic research papers. We use Serper for this. If you want to use paper search, follow the steps in the Customization guide to enable it.

Configuration Files

The configs/ directory holds YAML workflow configs that define agents, tools, LLMs, and routing. Use the one that matches your run mode and data sources:

Config	Models	Description
`config_cli_default.yml`	Nemotron 3 Nano 30B, GPT-OSS 120B, Nemotron 3 Super 120B	CLI default. Web search; optional paper search (requires `SERPER_API_KEY`); no knowledge retrieval.
`config_web_default_llamaindex.yml`	Nemotron 3 Nano 30B, GPT-OSS 120B, Nemotron 3 Super 120B, Nemotron Mini 4B	Web default. LlamaIndex knowledge retrieval; web search; optional paper search (requires `SERPER_API_KEY`).
`config_web_frag.yml`	Nemotron 3 Nano 30B, GPT-OSS 120B, Nemotron 3 Super 120B	Web + Foundational RAG (external RAG server). Helm default. See RAG Blueprint for an example RAG deployment.
`config_frontier_models.yml`	GPT-5.2 (orchestrator/planner), Nemotron 3 Nano 30B, Nemotron 3 Super 120B, Nemotron Mini 4B	Hybrid: frontier orchestrator/planner, open researcher. LlamaIndex; web search; optional paper search (requires `SERPER_API_KEY`). Requires `OPENAI_API_KEY`.

Ways to Run the Agents

The frontends/ directory contains different interfaces for interacting with the agents. You can also run agents directly through the NeMo Agent Toolkit CLI.

Command-line interface (CLI)

The CLI provides an interactive research assistant in your terminal:

# Activate the virtual environment
source .venv/bin/activate

# Run with the convenience script
./scripts/start_cli.sh

# Verbose logging
./scripts/start_cli.sh --verbose

# Or run directly with the NeMo Agent Toolkit CLI (dotenv loads deploy/.env into the environment)
dotenv -f deploy/.env run nat run --config_file configs/config_cli_default.yml --input "How do I install CUDA?"

The CLI frontend source is in frontends/cli/.

Web UI

For a full web-based experience:

./scripts/start_e2e.sh

This starts:

Backend API server at http://localhost:8000
Frontend UI at http://localhost:3000

The web UI source is in frontends/ui/. Refer to frontends/ui/README.md for more details.

Web UI with Docker Compose

You can also run the backend and UI with Docker Compose:

cd deploy/compose

# No-auth local setup (LlamaIndex default)
docker compose --env-file ../.env -f docker-compose.yaml up -d --build

# To select a different backend config, set BACKEND_CONFIG in deploy/.env, for example:
# BACKEND_CONFIG=/app/configs/config_web_frag.yml

For more details, refer to:

deploy/compose/README.md

Async Deep Research Jobs

Endpoints, SSE streaming, and debug console: refer to frontends/aiq_api/README.md.

Benchmarks

To run agents in evaluation mode, refer to the Evaluating the Workflow section.

Jupyter Notebooks

The docs/notebooks/ directory contains a three-part series that walks through the blueprint from first run to full customization. Run them in order:

#	Notebook	What it covers	Prerequisites
0	Getting Started with AI-Q	Full blueprint overview — environment setup, orchestrated workflow (intent routing, shallow and deep research), and Docker Compose deployment	`NVIDIA_API_KEY`; optionally `TAVILY_API_KEY`, `SERPER_API_KEY`
1	Deep Researcher — Web Search	Deep researcher in depth — Python API, `nat run`, and end-to-end evaluation against the DeepResearch Bench with `nat eval`	Notebook 0 completed; `NVIDIA_API_KEY`, `TAVILY_API_KEY`, `SERPER_API_KEY`; OpenAI or Gemini key for the judge model
2	Deep Researcher — Customization	Extending the deep researcher — adding paper search, assigning different LLMs per agent role, editing prompts, and enabling the knowledge layer	Notebooks 0 and 1 completed; `NVIDIA_API_KEY`, `TAVILY_API_KEY`, `SERPER_API_KEY`

Evaluating the Workflow

The frontends/benchmarks/ directory contains evaluation pipelines for assessing agent performance.

Available Benchmarks

Benchmark	Description	Location
Deep Research Bench	RACE and FACT evaluation for research quality	`frontends/benchmarks/deepresearch_bench/`
FreshQA	Factuality evaluation on time-sensitive questions	`frontends/benchmarks/freshqa/`

Running Evaluations

Step 1: Install the dataset

The dataset files are not included in the repository. We have included a script to retrieve them from the Deep Research Bench Github Repository and format them for the NeMo Agent Toolkit evaluator.

To download the dataset files, run the following script:

python frontends/benchmarks/deepresearch_bench/scripts/download_drb_dataset.py

Step 2: Generate reports using NAT evaluation harness

dotenv -f deploy/.env run nat eval --config_file frontends/benchmarks/deepresearch_bench/configs/config_deep_research_bench.yml

Step 3: Convert the output into a compatible format

python frontends/benchmarks/deepresearch_bench/scripts/export_drb_jsonl.py --input <path to your workflow_output.json> --output <path to the output file you want to create with .jsonl extension>

Step 4: Run evaluation

Follow instructions in the Deep Research Bench Github Repository to run evaluation and obtain scores.

Optional: Phoenix Tracing

If your config enables Phoenix tracing, start the Phoenix server before running nat eval.

Start server (separate terminal):

source .venv/bin/activate
phoenix serve

For detailed benchmark documentation, refer to:

Development

For development, contribution, and documentation, refer to:

Development and Contributing: Setup, testing, PR workflow, sign-off/DCO
Tutorial Notebooks: Getting started overview, deeper dive, and customization notebooks
Architecture: Component details and data flow
Customization: Configuration and customization options
Knowledge Layer Setup: RAG backends and document ingestion
Docs index: Full documentation list and component docs
Changelog: Version history and changes

Roadmap

Security Considerations

The AI-Q Blueprint is shared as a reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities.
A robust frontend that handles AuthN & AuthZ is highly recommended. Missing AuthN & AuthZ will result in ungated access to customer models if directly exposed e.g. the internet, resulting in either cost to the customer, resource exhaustion, or denial of service.
End users are encouraged to add NeMo Guardrails and additional prompt content filtering to the blueprint. Guardrails will be native in upcoming release.
The AI-Q Blueprint doesn't require any privileged access to the system.
The AI-Q Blueprint doesn't currently generate any code that may require sandboxing. Future roadmap features (such as custom skills) will introduce sandboxed execution environments.
End users are responsible for ensuring the availability of their deployment.
End users are responsible for building, and patching, the container images to keep them up to date.
The end users are responsible for ensuring that OSS packages used by the developer blueprint are current.
The logs from middleware, backend, and demo app are printed to standard out. They can include input prompts and output completions for development purposes. The end users are advised to handle logging securely and avoid information leakage for production use cases.

License

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use, found in LICENSE-THIRD-PARTY.

GOVERNING TERMS: AIQ blueprint software and materials are governed by the Apache License, Version 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA AI-Q Blueprint

Table of Contents

Overview

Software Components

Target Audience

Prerequisites

Hardware Requirements

Architecture

Getting Started

Clone the Repository

Automated Setup

Manual Installation

Obtain API Keys

Obtain an NVIDIA API Key

Obtain a Tavily API Key

Obtain a Serper API Key

Set Up Environment Variables

Configuration Files

Ways to Run the Agents

Command-line interface (CLI)

Web UI

Web UI with Docker Compose

Async Deep Research Jobs

Benchmarks

Jupyter Notebooks

Evaluating the Workflow

Available Benchmarks

Running Evaluations

Step 1: Install the dataset

Step 2: Generate reports using NAT evaluation harness

Step 3: Convert the output into a compatible format

Step 4: Run evaluation

Optional: Phoenix Tracing

Development

Roadmap

Security Considerations

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

NVIDIA AI-Q Blueprint

Table of Contents

Overview

Software Components

Target Audience

Prerequisites

Hardware Requirements

Architecture

Getting Started

Clone the Repository

Automated Setup

Manual Installation

Obtain API Keys

Obtain an NVIDIA API Key

Obtain a Tavily API Key

Obtain a Serper API Key

Set Up Environment Variables

Configuration Files

Ways to Run the Agents

Command-line interface (CLI)

Web UI

Web UI with Docker Compose

Async Deep Research Jobs

Benchmarks

Jupyter Notebooks

Evaluating the Workflow

Available Benchmarks

Running Evaluations

Step 1: Install the dataset

Step 2: Generate reports using NAT evaluation harness

Step 3: Convert the output into a compatible format

Step 4: Run evaluation

Optional: Phoenix Tracing

Development

Roadmap

Security Considerations

License