feat: Add citations implementation for OpenAI, Anthropic, and Google … #3729

certainly-param · 2025-12-15T10:32:31Z

Summary

This PR implements structured citation support across multiple providers and surfaces them consistently on TextPart objects in ModelResponse. It builds on the design discussed in #3126 and the earlier implementation in #2657, rebased on the latest main.

What this changes

Core message model
- Introduces a generic Citation representation (e.g. URLCitation, ToolResultCitation, GroundingCitation).
- Extends TextPart to carry a list of citations, so downstream code can access provider-specific citation data in a uniform way.
- Wires citations into the existing ModelMessage / ModelResponse graph.
OpenAI (Chat Completions + Responses APIs)
- Parses annotations from ChatCompletionMessage and maps them into TextPart.citations.
- Supports URL-style citations exposed via the OpenAI Responses API, preserving URL, title, and index range metadata.
- Ensures citations are preserved even when content is split into multiple TextPart instances.
Anthropic
- Maps tool result citations from Anthropic responses into the shared Citation representation.
- Attaches these citations to the relevant TextPart instances.
Google / Gemini
- Parses grounding_metadata and citation_metadata from Gemini responses.
- Maps web and file search grounding information into GroundingCitation objects.
- Attaches citations to TextPart according to the character / byte ranges provided by Google.
Perplexity
- Adds a Perplexity provider and wires its citation output into the same Citation model, so downstream code can consume it alongside the other providers.
Docs & examples
- Adds API docs for pydantic_ai.messages describing the Citation types and how they appear on TextPart.citations.
- Adds examples showing how to access and display citations for OpenAI, Anthropic, Google/Gemini, and Perplexity.
Tests
- Adds unit tests for:
  - OpenAI Chat Completions citations
  - OpenAI Responses API citations
  - Anthropic tool result citations
  - Gemini grounding / web search citations
  - Perplexity citations
  - Edge cases (streaming, message history, OTEL integration, etc.)

Linked issues / prior work

Closes Support structured citations #3126
Closes Support web search citation output (OpenAI Responses API) #2194
Closes Support Anthropic tool result citations #2128
Closes Make Gemini google_search tool grounding metadata available in model response #2890
Closes Perplexity AI Citations #996
Supersedes Add citations #2657

This PR picks up the work started in #2657, rebases it onto the latest main, and extends it to cover the full structured citation story across providers (including Chat Completions).

…models

DouweM

@certainly-param Thanks for working on this! Before we continue coding though, I'd like to spend a bit more time working out the design of our unified interface, as the goal of this feature is to let users deal with citations from all providers in one consistent way, instead of having to know that OpenAI returns X, Anthropic returns Y, Google returns Z etc.

So before we get to coding, we need to know all of the diverse data representations we're dealing with, and then come up with something that covers them all (with mostly fields we can fill in for each provider, and then some optional or provider_details fields for extra provider-specific stuff).

So as a first step, can you please gather the type definitions for all the annotations/citations/grounding data the various providers support and post it in the issue, with a proposal of how we could represent that in a unified way? Ideally we'd also have samples of real data, which you can likely find in existing Pydantic AI cassettes (e.g. look for cited_text to find Anthropic data; groundingMetadata to find Google data, etc).

I think it may be possible for the TextPart.citations list to only have a single type Citation, and then for a Citation to have a source field that could be Url or File or ToolResult or something else that's revealed by that research into all the shapes the provider data can take.

DouweM · 2025-12-16T17:39:04Z

docs/api/messages.md

    ModelResponse("ModelResponse(parts=list[...])") --- ModelMessage("ModelMessage<br>(Union)")
 ```

+## Citations


We don't need this here; for API docs purposes this context can be on docstrings.

DouweM · 2025-12-16T17:40:20Z

docs/citations/test_coverage.md

Unnecessary doc

DouweM · 2025-12-16T17:41:34Z

docs/citations/accessing_citations.md

@@ -0,0 +1,287 @@
+# Accessing Citations


This doc should just be citations.md, title Citations, and the intro paragraph should introduce the feature and talk about the different types of citations that are supported for which models

In any case, we can come back to docs later and focus on the code first!

DouweM · 2025-12-16T17:43:09Z

docs/citations/accessing_citations.md

+                            print(f"Citation Data: {citation.citation_data}")
+```
+
+### GroundingCitation (Google)


Do the 3 providers really need their own types? We can't unify to a single format, perhaps with some optional provider-specific fields?

DouweM · 2025-12-16T17:43:58Z

docs/examples/citations.md

+- **OpenAI** (Chat Completions and Responses APIs): `URLCitation` with URL, title, and character indices
+- **Anthropic**: `ToolResultCitation` from tool execution results
+- **Google/Gemini**: `GroundingCitation` from grounding metadata
+- **OpenRouter**: Uses OpenAI-compatible citation format


Note that OpenRouter supports file annotations as well, which are not part of the OpenAI spec:

pydantic-ai/pydantic_ai_slim/pydantic_ai/models/openrouter.py

Lines 349 to 360 in d2b08ad

class _OpenRouterFileAnnotation(BaseModel, frozen=True):

"""File annotation from OpenRouter.

OpenRouter can return file annotations when processing uploaded files like PDFs.

The schema is flexible since OpenRouter doesn't document the exact fields.

"""

type: Literal['file']

file: dict[str, Any] | None = None

_OpenRouterAnnotation: TypeAlias = _OpenAIAnnotation | _OpenRouterFileAnnotation

DouweM · 2025-12-16T17:46:58Z

pydantic_ai_slim/pydantic_ai/messages.py


+@dataclass(repr=False)
+class URLCitation:
+    """A citation with a URL, used by OpenAI and similar providers.


Can we also map Anthropic web search citations (

pydantic-ai/tests/models/cassettes/test_anthropic/test_anthropic_web_search_tool.yaml

Lines 142 to 149 in d2b08ad

- citations:

- cited_text: 'The air has reached a high level of pollution and is unhealthy for sensitive groups. '

encrypted_index: Eo8BCioIBxgCIiQ0NGFlNjc2Yy05NThmLTRkNjgtOTEwOC1lYWU5ZGU3YjM2NmISDKBO3m5oU3zDP/M1lBoMBKa8Z3revdebJHWbIjCRSJ1/FdR/uZeWZy5x85sd7yfm0SW+4URT2sN/CN5Qf9fQpe/sppMjAby+dqZg6bcqE3MW5v2cyJybai3gEjOauAM3d+EYBA==

title: San Francisco, CA Weather Forecast | AccuWeather

type: web_search_result_location

url: https://www.accuweather.com/en/us/san-francisco/94103/weather-forecast/347629

text: Air quality is poor and unhealthy for sensitive groups

type: text

) and Google web search annotations to the same data model?

Ultimately the goal of native citations support is to give users 1 way to deal with all citations from all providers; not to have them write provider-specific code. So we should try our hardest to unify the data model.s

DouweM · 2025-12-16T17:48:26Z

pydantic_ai_slim/pydantic_ai/messages.py

+    citation_data: dict[str, Any] | None = None
+    """Extra citation data from the tool result.
+
+    Structure varies by provider.


Why if this class is Anthropic-specific anyway?

DouweM · 2025-12-16T17:49:57Z

pydantic_ai_slim/pydantic_ai/messages.py

+    Has info about sources used for grounding.
+    """
+
+    citation_metadata: dict[str, Any] | None = None


As mentioned, I'd like to parse this into a unified data model equivalent to what we get for OpenAI and Anthropic (web search) results.

I think it may be possible for the TextPart.citations list to only have a single type Citation, and then for a Citation to have a source that could be Url or File or ToolResult or something else.

DouweM · 2025-12-16T17:50:27Z

pydantic_ai_slim/pydantic_ai/models/__init__.py

        'vercel',
-        'litellm',
-        'nebius',
-        'ovhcloud',


We lost a bunch of stuff here! Please review the entire PR line by line before passing it to me for review :)

DouweM · 2025-12-16T17:51:30Z

tests/test_citation_message_history.py

+    json_bytes = ModelMessagesTypeAdapter.dump_json(messages)
+    deserialized = ModelMessagesTypeAdapter.validate_python(ModelMessagesTypeAdapter.validate_json(json_bytes))
+
+    assert len(deserialized) == 1


Please use snapshots in tests wherever possible, so we can see the entire resulting structure rather than testing field by field (and possible missing some)

hayescode · 2025-12-18T00:16:12Z

Will Azure's version of OpenAI's responses API web_search be included here? See #3698

Microsoft doesn't list the URL sources in the web_search events and only adds then to annotations in the final response. I have no idea why Microsoft doesn't match OpenAI's perfectly good responses API, it is maddening.

DouweM · 2025-12-18T20:04:12Z

@hayescode Yep, fortunately Azure is using the same annotations/citations field that OpenAI uses, and those are exactly the fields we're looking to support here.

OpenAI doesn't include web search sources by default either, requiring to be set https://ai.pydantic.dev/api/models/openai/#pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_include_web_search_sources. Does Azure not support that / something equivalent?

hayescode · 2025-12-19T00:51:42Z

@hayescode Yep, fortunately Azure is using the same annotations/citations field that OpenAI uses, and those are exactly the fields we're looking to support here.

OpenAI doesn't include web search sources by default either, requiring to be set https://ai.pydantic.dev/api/models/openai/#pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_include_web_search_sources. Does Azure not support that / something equivalent?

@DouweM Azure OpenAI does match the shape of the Responses API, but they don't populate the "source" or anything and instead add the sources to the final annotations. This is because Azure doesn't use OpenAI's web search, they ruined the best web_search tool by making their own Bing version. Here's the documentation: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/web-search?view=foundry-classic

  "response": {
    "id": "resp_035bd9a24ffa66a00069441485f7a48190aa254c61ac5a619b",
    "created_at": 1766069381.0,
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "metadata": {},
    "model": "gpt-5.2",
    "object": "response",
    "output": [
      {
        "id": "ws_035bd9a24ffa66a000694414869efc81909ea11f148f1d994a",
        "action": {
          "query": null,
          "type": "search",
          "sources": null
        },
        "status": "completed",
        "type": "web_search_call"
      },
      {
        "id": "ws_035bd9a24ffa66a0006944148724148190b8f665fff7e18d83",
        "action": {
          "query": "current weather St. Louis MO right now",
          "type": "search",
          "sources": null
        },
        "status": "completed",
        "type": "web_search_call"
      },
      {
        "id": "msg_035bd9a24ffa66a0006944148e09d88190b1330eadedcdf2c6",
        "content": [
          {
            "annotations": [
              {
                "end_index": 659,
                "start_index": 541,
                "title": "St. Louis, MO Current Weather | AccuWeather",
                "type": "url_citation",
                "url": "https://www.accuweather.com/en/us/st-louis/63102/current-weather/349084"
              },
              {
                "end_index": 789,
                "start_index": 659,
                "title": "Saint Louis, MO Current Weather - The Weather Network",
                "type": "url_citation",
                "url": "https://www.theweathernetwork.com/en/city/us/missouri/saint-louis/current"
              },
              {
                "end_index": 873,
                "start_index": 789,
                "title": "Weather today - St. Louis, MO",
                "type": "url_citation",
                "url": "https://www.weather-us.com/en/missouri-usa/st-louis"
              }
            ],
            "text": "I can’t reliably give “current” weather from these results because they’re inconsistent and appear to be from different dates/times. For example, AccuWeather shows **54°F and sunny at 1:05 PM** , while The Weather Network shows **-13°C (about 9°F) and clear** updated minutes ago , and Weather Channel’s hourly view is stamped for **Monday, Dec 15** .\n\nTell me your **ZIP code (or nearest neighborhood)** and whether you want it **right now** or **today’s forecast**, and I’ll pull the most consistent current conditions for your exact spot.[St. Louis, MO Current Weather | AccuWeather](https://www.accuweather.com/en/us/st-louis/63102/current-weather/349084)[Saint Louis, MO Current Weather - The Weather Network](https://www.theweathernetwork.com/en/city/us/missouri/saint-louis/current)[Weather today - St. Louis, MO](https://www.weather-us.com/en/missouri-usa/st-louis)",
            "type": "output_text",
            "logprobs": []
          }
        ],
        "role": "assistant",
        "status": "completed",
        "type": "message"
      }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": {
      "type": "web_search_preview"
    },
    "tools": [
      {
        "type": "web_search",
        "filters": null,
        "search_context_size": "medium",
        "user_location": {
          "city": "St Louis",
          "country": "US",
          "region": "MO",
          "timezone": null,
          "type": "approximate"
        }
      }
    ],
    "top_p": 0.98,
    "background": false,
    "conversation": null,
    "max_output_tokens": null,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "prompt_cache_key": null,
    "prompt_cache_retention": null,
    "reasoning": {
      "effort": "none",
      "generate_summary": null,
      "summary": null
    },
    "safety_identifier": null,
    "service_tier": "default",
    "status": "completed",
    "text": {
      "format": {
        "type": "text"
      },
      "verbosity": "medium"
    },
    "top_logprobs": 0,
    "truncation": "disabled",
    "usage": {
      "input_tokens": 6232,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 236,
      "output_tokens_details": {
        "reasoning_tokens": 65
      },
      "total_tokens": 6468
    },
    "user": null,
    "content_filters": null,
    "store": true
  },
  "sequence_number": 34,
  "type": "response.completed"

certainly-param added 2 commits December 15, 2025 15:49

feat: Add citations implementation for OpenAI, Anthropic, and Google …

26ab22e

…models

fix: clean up citations tests and OpenAI model typing

b595ca7

DouweM self-assigned this Dec 16, 2025

DouweM added the awaiting author revision label Dec 16, 2025

DouweM mentioned this pull request Dec 16, 2025

Add citations #2657

Closed

DouweM requested changes Dec 16, 2025

View reviewed changes

DouweM mentioned this pull request Dec 18, 2025

Support structured citations #3126

Open

3 tasks

	class _OpenRouterFileAnnotation(BaseModel, frozen=True):
	"""File annotation from OpenRouter.

	OpenRouter can return file annotations when processing uploaded files like PDFs.
	The schema is flexible since OpenRouter doesn't document the exact fields.
	"""

	type: Literal['file']
	file: dict[str, Any] \| None = None


	_OpenRouterAnnotation: TypeAlias = _OpenAIAnnotation \| _OpenRouterFileAnnotation

	- citations:
	- cited_text: 'The air has reached a high level of pollution and is unhealthy for sensitive groups. '
	encrypted_index: Eo8BCioIBxgCIiQ0NGFlNjc2Yy05NThmLTRkNjgtOTEwOC1lYWU5ZGU3YjM2NmISDKBO3m5oU3zDP/M1lBoMBKa8Z3revdebJHWbIjCRSJ1/FdR/uZeWZy5x85sd7yfm0SW+4URT2sN/CN5Qf9fQpe/sppMjAby+dqZg6bcqE3MW5v2cyJybai3gEjOauAM3d+EYBA==
	title: San Francisco, CA Weather Forecast \| AccuWeather
	type: web_search_result_location
	url: https://www.accuweather.com/en/us/san-francisco/94103/weather-forecast/347629
	text: Air quality is poor and unhealthy for sensitive groups
	type: text

feat: Add citations implementation for OpenAI, Anthropic, and Google … #3729

Are you sure you want to change the base?

feat: Add citations implementation for OpenAI, Anthropic, and Google … #3729

Conversation

certainly-param commented Dec 15, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this changes

Linked issues / prior work

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hayescode commented Dec 18, 2025

Uh oh!

DouweM commented Dec 18, 2025

Uh oh!

hayescode commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

certainly-param commented Dec 15, 2025 •

edited by DouweM

Loading