Skip to content

Add comprehensive OpenTelemetry tracing documentation for Agent Engine#1537

Open
perashanid wants to merge 1 commit intogoogle:mainfrom
perashanid:docs/add-otel-tracing-agent-engine-1215
Open

Add comprehensive OpenTelemetry tracing documentation for Agent Engine#1537
perashanid wants to merge 1 commit intogoogle:mainfrom
perashanid:docs/add-otel-tracing-agent-engine-1215

Conversation

@perashanid
Copy link
Copy Markdown

Description

This PR adds comprehensive documentation for implementing OpenTelemetry tracing with Vertex AI Agent Engine, addressing issue #1215.

The issue reported that documentation was insufficient to properly implement OpenTelemetry for Agent Engine, leading to blank dashboards and missing traces. The root cause was the lack of documentation about the critical initialization sequence - specifically that tracing MUST be initialized within the AdkApp.set_up() method.

Changes Made

New Documentation

  • docs/observability/tracing-agent-engine.md - Comprehensive guide covering:
    • Why initialization order matters (critical section explaining the set_up() requirement)
    • Architecture diagram showing trace flow from agent to dashboard
    • Two implementation methods (environment variables and manual configuration)
    • Complete code examples with proper initialization patterns
    • Infrastructure configuration (Terraform, IAM)
    • Troubleshooting guide for common issues
    • Best practices for production deployments

Updated Documentation

  • docs/observability/index.md - Updated to link to the new tracing guide
  • mkdocs.yml - Added navigation entry for the new tracing page

Technical Implementation

The documentation addresses all points raised in issue #1215:

  1. Documenting the init sequence clearly

    • Explains why set_up() is required
    • Shows correct vs incorrect initialization patterns
    • Provides complete working examples
  2. Documenting canonical data flows

    • Architecture diagram showing flow from agent → Agent Engine → Cloud Trace → Vertex AI Dashboard
    • Explains span structure and hierarchy
    • Details span attributes for each operation type
  3. Architecture diagram of infrastructure configuration

    • Visual diagram of the complete tracing pipeline
    • Terraform configuration examples
    • IAM permission requirements
  4. Reference to adk-samples

    • Links to Agent Starter Pack in additional resources
    • References the starter pack as a reference implementation

Key Features

Critical Warning Section

Added a prominent warning box at the top explaining the initialization requirement:

OpenTelemetry tracing initialization MUST occur within the AdkApp.set_up() method when deploying to Agent Engine. Initializing tracing outside of this lifecycle method will result in blank dashboards and missing traces.

Two Implementation Paths

  1. Environment Variables (Recommended) - Simple approach using --trace_to_cloud flag or enable_tracing=True
  2. Manual Configuration - Advanced approach with full control over OpenTelemetry setup

Comprehensive Troubleshooting

Covers common issues:

  • Blank dashboard / no traces
  • Incomplete traces
  • Missing prompt/response content
  • High trace volume / cost concerns

Each issue includes symptoms and specific solutions.

Production Best Practices

  • Sampling strategies for cost control
  • Error handling patterns
  • Environment-based configuration
  • Custom span attributes

Testing

  • Documentation follows existing ADK docs style and structure
  • Code examples are complete and runnable
  • All links are valid
  • Navigation is properly configured in mkdocs.yml
  • Markdown formatting is correct

Related Issue

Closes #1215

Checklist

  • Documentation is clear and comprehensive
  • Code examples follow ADK best practices
  • Architecture diagrams are included
  • Troubleshooting section addresses common issues
  • Links to related documentation are provided
  • Navigation is updated in mkdocs.yml
  • Follows the repository's documentation style guide

- Add detailed guide for implementing OTEL tracing with Agent Engine
- Document critical set_up() initialization requirement
- Include architecture diagrams and data flow explanations
- Provide both environment variable and manual configuration methods
- Add infrastructure setup examples (Terraform, IAM)
- Include comprehensive troubleshooting guide
- Add production best practices and complete code examples

Fixes google#1215
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 1, 2026

Deploy Preview for adk-docs-preview ready!

Name Link
🔨 Latest commit cc05e04
🔍 Latest deploy log https://app.netlify.com/projects/adk-docs-preview/deploys/69ccb5a6fe86790008fbdca5
😎 Deploy Preview https://deploy-preview-1537--adk-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@joefernandez
Copy link
Copy Markdown
Collaborator

Hey @perashanid. Thanks for the contribution. Due to the size of this article, there will be a substantial delay in reviewing the content before we decide we can accept it for publishing.

In the meantime, we strongly encourage you to publish this content as a blog post or online developer forums for the benefit of developers using ADK. Thanks for sharing your knowledge with the ADK developer community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Documentation is insufficient to implement OpenTelemetry for Agent Engine properly.

2 participants