Evaluate agentic analytics — build the evaluation framework #68

Bl3f · 2026-01-27T08:40:23Z

Bl3f
Jan 27, 2026
Maintainer

Opening this discussion to start drafting what an evaluation framework for agentic analytics should be.

Problem

When deploying an agentic analytics solution teams want to be sure the agentic solution does not hallucinate when answering questions. Hallucination can be of multiple factors, it can hallucinates tables or columns names, but it does not limit to hallucination there is also a information retrieval problem.

As a an admin deploying nao I want to make sure before deploying to my stakeholders that the agentic loop on a determined set of questions, situations or scenario is able to limit hallucinations and retrieve the right context to answer the problem.

We then want to create a tool that gives the capabilities to users to describe problems inputs and outputs and then be able to run these problems against the agentic loop of multiple models and compare the performances.

Eventually the users can see the impact of adding this or this context on the performance of the output (eg. should I add a preview of the tables or not).

ClaireGz · 2026-01-27T18:16:37Z

ClaireGz
Jan 27, 2026
Maintainer

On user experience, I would add:
As a user I want to focus on building my "unit tests" set and have an easy command to test their performance

I think these should be the KPIs to measure agent performance:

reliability of the agent (% of answers right)
agent coverage (% of questions with an answer)
tokens consumption
time to answer
compute cost (from SQL compute)

Some questions we should solve:

How do we measure reliability of the agent? Do we pre-define functions to measure reliability / do we let the user define their own reliability tests?
How do we make these tests not too costly to run (they will generate llm costs + compute costs)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate agentic analytics — build the evaluation framework #68

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Evaluate agentic analytics — build the evaluation framework #68

Uh oh!

Uh oh!

Bl3f Jan 27, 2026 Maintainer

Problem

Replies: 1 comment

Uh oh!

ClaireGz Jan 27, 2026 Maintainer

Bl3f
Jan 27, 2026
Maintainer

ClaireGz
Jan 27, 2026
Maintainer