Skip to content

epam/ai-dial-analytics-realtime

DIAL Realtime analytics

About DIALX

Overview

Realtime analytics server for AI DIAL. The service consumes the logs stream from AI DIAL Core, analyzes the conversation and writes the analytics to the InfluxDB.

Refer to Documentation to learn how to configure AI DAL Core and other necessary components.

Usage

Check the AI DIAL Core documentation to configure the way to send the logs to the instance of the realtime analytics server.

InfluxDB schema

The realtime analytics server analyzes the logs stream provided by Vector in the realtime and writes metrics to the InfluxDB.

Chat completion and embedding requests

The logs for /chat/completions and /embeddings endpoints are saved to the analytics measurement with the following tags and fields:

Tag Description
model The model name for the request.
deployment The deployment name of the model or application for the request.
parent_deployment The deployment name of the model or application that called the current deployment.
execution_path A list of deployment calls representing the call stack of the request. E.g. ['app1', 'app2', 'model1'] means app1 called app2 and app2 called model1. The last element of the list equals to the deployment tag. The penultimate element of the list (when present) equals to the parent_deployment tag.
trace_id OpenTelemetry trace ID.
core_span_id OpenTelemetry span ID generated by DIAL Core.
core_parent_span_id OpenTelemetry span ID generated by DIAL Core that called the span core_span_id.
project_id The project ID for the request.
language The language detected for the content of the request.
upstream The upstream endpoint used by the DIAL model.
topic The topic detected for the content of the request.
title The title of the person making the request.
response_id Unique ID of the response. For chat completion response it equals to id response field; for embedding request - it's generate from scratch as UUID.
Field Type Description
user_hash string The unique hash identifying the user.
deployment_price float The cost of this specific request, excluding the cost of any requests it directly or indirectly initiated.
price float The total cost of the request, including the cost of this request and all related requests it directly or indirectly triggered. It always holds that price>=deployment_price.
number_request_messages int The total number of messages in the request. For chat completion requests it's number of messages in the chat history. For embedding requests it's number of inputs.
chat_id string The unique identifier for the conversation that this request is part of.
prompt_tokens int The number of tokens in the request.
cached_prompt_tokens int The number of tokens read from the model cache. cached_prompt_tokens <= prompt_tokens
completion_tokens int The number of tokens in the response.

Rate requests

The logs for the /rate endpoint are saved to the rate_analytics measurement:

Tag Description
deployment The deployment name of the model or application for the request.
project_id The project ID for the request.
title The title of the person making the request.
response_id Unique ID of the response.
user_hash The unique hash identifying the user.
chat_id The unique identifier for the conversation that this request is part of.
Field Type Description
dislike_count int 1 for a thumbs up request, otherwise 0.
like_count int 1 for a thumbs down request, otherwise 0.

MCP requests

The logs for the /mcp endpoint are saved to the mcp_analytics measurement:

Tag Description
project_id The project ID for the request.
title The title of the person making the request.
deployment The deployment name of a DIAL toolset corresponding to the MCP call.
parent_deployment The deployment name of the model or application that called the DIAL toolset.
mcp_method MCP method name such as tools/list, tools/call etc.
Field Type Description
execution_path string A list of deployment calls representing the call stack of the request. E.g. ['app1', 'app2', 'toolset1'] means app1 called app2 and app2 called toolset1. The last element of the list equals to the deployment tag. The penultimate element of the list (when present) equals to the parent_deployment tag.
chat_id string The unique identifier for the conversation that this request is part of.
user_hash string The unique hash identifying the user.
upstream string The upstream endpoint of the DIAL toolset.
trace_id string OpenTelemetry trace ID.
core_span_id string OpenTelemetry span ID generated by DIAL Core.
core_parent_span_id string OpenTelemetry span ID generated by DIAL Core that called the span core_span_id.
mcp_tool_call_name string The name of the requested tool given that mcp_method equal to tools/call.

Note

Only the requests with the HTTP status code 200 are processed by the analytics server.

Configuration

Copy .env.example to .env and customize it for your environment.

Connection to the InfluxDB

InfluxDB 2

You need to specify the connection options to the InfluxDB instance using the environment variables:

Variable Description
INFLUX_URL URL to the InfluxDB to write the analytics data
INFLUX_ORG Name of the InfluxDB organization to write the analytics data
INFLUX_BUCKET Name of the bucket to write the analytics data
INFLUX_API_TOKEN InfluxDB API Token

You can follow the InfluxDB 2 documentation to setup InfluxDB locally and acquire the required configuration parameters.

InfluxDB 3

You need to specify the connection options to the InfluxDB instance using the environment variables:

Variable Description
INFLUX_URL URL to the InfluxDB to write the analytics data
INFLUX_DATABASE Name of the InfluxDB 3 database to write the analytics data
INFLUX_API_TOKEN InfluxDB API Token with the write access to the target database

You can follow the InfluxDB 3 documentation to setup InfluxDB locally and acquire the required configuration parameters.

Important

The INFLUX_DATABASE variable was introduced in version 0.22.0. For earlier versions set INFLUX_BUCKET variable to the target database name and INFLUX_ORG variable to any non-empty value (e.g. "ignored") to enable the InfluxDB 3 support.

Aggregated Dashboards (Optional)

This project includes optional aggregated Grafana dashboards that visualize 6-hours and monthly trends.

To enable these dashboards, you must manually create the required InfluxDB buckets and tasks. These steps are not automated via Helm and must be applied manually.

See influxdb/README.md for full instructions.

Important

Aggregated Dashboards are only supported for InfluxDB 2.

Other configuration

Also, following environment valuables can be used to configure the service behavior:

Variable Default Description
MODEL_RATES {} Specifies per-token price rates for models in JSON format
TOPIC_MODEL Specifies the name or path for the topic model. If the model is specified by name, it will be downloaded from the Huggingface. When unset or set to an empty string, the topic classification feature is disabled.
TOPIC_EMBEDDINGS_MODEL Specifies the name or path for the embeddings model used with the topic model. If the model is specified by name, it will be downloaded from the Huggingface. When unset or set to an empty string, the name will be used from the topic model config.
LOG_LEVEL INFO The server logging level. Use DEBUG for dev purposes and INFO in prod

Example of the MODEL_RATES configuration:

{
    "gpt-4": {
        "unit":"token",
        "prompt_price":"0.00003",
        "completion_price":"0.00006"
    },
    "gpt-35-turbo": {
        "unit":"token",
        "prompt_price":"0.0000015",
        "completion_price":"0.000002"
    },
    "gpt-4-32k": {
        "unit":"token",
        "prompt_price":"0.00006",
        "completion_price":"0.00012"
    },
    "text-embedding-ada-002": {
        "unit":"token",
        "prompt_price":"0.0000001"
    },
    "chat-bison@001": {
        "unit":"char_without_whitespace",
        "prompt_price":"0.0000005",
        "completion_price":"0.0000005"
    }
}

Development

Development Environment

This project requires Python ≥3.11 and Poetry ≥2.1.1 for dependency management.

Setup

  1. Install Poetry. See the official installation guide.

  2. (Optional) Specify custom Python or Poetry executables in .env.dev. This is useful if multiple versions are installed. By default, python and poetry are used.

    POETRY_PYTHON=path-to-python-exe
    POETRY=path-to-poetry-exe
  3. Create and activate the virtual environment:

    make init_env
    source .venv/bin/activate
  4. Install project dependencies (including linting, formatting, and test tools):

    make install

Build

To build the wheel packages run:

make build

Run

To run the development server locally run:

make serve

The server will be running as http://localhost:5001

Docker

To build the docker image run:

make docker_build

To run the server locally from the docker image run:

make docker_serve

The server will be running as http://localhost:5001

Lint

Run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

Run unit tests locally:

make test

Clean

To remove the virtual environment and build artifacts:

make clean

About

Realtime analytics server for AI DIAL. The service consumes the logs stream AI DIAL Core, analyzes the conversation and writes the analytics to InfluxDB

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors