Epic: Part 7 — AI Evaluations

## Epic: Part 7 — AI Evaluations

Integration with Vertex AI Evaluation Service to run automated LLM judge evaluations on Glow CI outputs. All LLM calls (RAG retrieval + Gemini synthesis) are logged via Cloud Logging and fed into Vertex AI evals. Business stakeholders define evaluation criteria (hallucination, citation coverage, relevance); the platform runs automated evaluations and surfaces results.

**Key capabilities:**
- Cloud Logging — all Glow CI LLM calls instrumented and observable via GCP
- Vertex AI Evaluation Service — LLM call logs feed automated evaluations using Vertex AI's built-in evaluation pipeline
- Stakeholder-defined criteria (hallucination, citation coverage, relevance)
- Continuous automated eval runs with results visible to business teams

---

## Stories

| # | Story | Role | Sprint |
|---|-------|------|--------|
| [7.2](https://github.com/String-sg/tw-context-intelligence/issues/54) | Instrument Glow CI with Cloud Logging + connect to Vertex AI Evaluation Service | Engineer | Sprint 5 |

---
📄 **PRD:** [Part 7 — Glow CI PRD](https://github.com/String-sg/tw-context-intelligence/blob/main/Glow%20CI%20PRD.md#part-7-ai-evaluations)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Part 7 — AI Evaluations #53