Realtime analytics server for AI DIAL. The service consumes the logs stream from AI DIAL Core, analyzes the conversation and writes the analytics to the InfluxDB.
Refer to Documentation to learn how to configure AI DAL Core and other necessary components.
Check the AI DIAL Core documentation to configure the way to send the logs to the instance of the realtime analytics server.
The realtime analytics server analyzes the logs stream provided by Vector in the realtime and writes metrics to the InfluxDB.
The logs for /chat/completions and /embeddings endpoints are saved to the analytics measurement with the following tags and fields:
| Tag | Description |
|---|---|
| model | The model name for the request. |
| deployment | The deployment name of the model or application for the request. |
| parent_deployment | The deployment name of the model or application that called the current deployment. |
| execution_path | A list of deployment calls representing the call stack of the request. E.g. ['app1', 'app2', 'model1'] means app1 called app2 and app2 called model1. The last element of the list equals to the deployment tag. The penultimate element of the list (when present) equals to the parent_deployment tag. |
| trace_id | OpenTelemetry trace ID. |
| core_span_id | OpenTelemetry span ID generated by DIAL Core. |
| core_parent_span_id | OpenTelemetry span ID generated by DIAL Core that called the span core_span_id. |
| project_id | The project ID for the request. |
| language | The language detected for the content of the request. |
| upstream | The upstream endpoint used by the DIAL model. |
| topic | The topic detected for the content of the request. |
| title | The title of the person making the request. |
| response_id | Unique ID of the response. For chat completion response it equals to id response field; for embedding request - it's generate from scratch as UUID. |
| Field | Type | Description |
|---|---|---|
| user_hash | string | The unique hash identifying the user. |
| deployment_price | float | The cost of this specific request, excluding the cost of any requests it directly or indirectly initiated. |
| price | float | The total cost of the request, including the cost of this request and all related requests it directly or indirectly triggered. It always holds that price>=deployment_price. |
| number_request_messages | int | The total number of messages in the request. For chat completion requests it's number of messages in the chat history. For embedding requests it's number of inputs. |
| chat_id | string | The unique identifier for the conversation that this request is part of. |
| prompt_tokens | int | The number of tokens in the request. |
| cached_prompt_tokens | int | The number of tokens read from the model cache. cached_prompt_tokens <= prompt_tokens |
| completion_tokens | int | The number of tokens in the response. |
The logs for the /rate endpoint are saved to the rate_analytics measurement:
| Tag | Description |
|---|---|
| deployment | The deployment name of the model or application for the request. |
| project_id | The project ID for the request. |
| title | The title of the person making the request. |
| response_id | Unique ID of the response. |
| user_hash | The unique hash identifying the user. |
| chat_id | The unique identifier for the conversation that this request is part of. |
| Field | Type | Description |
|---|---|---|
| dislike_count | int | 1 for a thumbs up request, otherwise 0. |
| like_count | int | 1 for a thumbs down request, otherwise 0. |
The logs for the /mcp endpoint are saved to the mcp_analytics measurement:
| Tag | Description |
|---|---|
| project_id | The project ID for the request. |
| title | The title of the person making the request. |
| deployment | The deployment name of a DIAL toolset corresponding to the MCP call. |
| parent_deployment | The deployment name of the model or application that called the DIAL toolset. |
| mcp_method | MCP method name such as tools/list, tools/call etc. |
| Field | Type | Description |
|---|---|---|
| execution_path | string | A list of deployment calls representing the call stack of the request. E.g. ['app1', 'app2', 'toolset1'] means app1 called app2 and app2 called toolset1. The last element of the list equals to the deployment tag. The penultimate element of the list (when present) equals to the parent_deployment tag. |
| chat_id | string | The unique identifier for the conversation that this request is part of. |
| user_hash | string | The unique hash identifying the user. |
| upstream | string | The upstream endpoint of the DIAL toolset. |
| trace_id | string | OpenTelemetry trace ID. |
| core_span_id | string | OpenTelemetry span ID generated by DIAL Core. |
| core_parent_span_id | string | OpenTelemetry span ID generated by DIAL Core that called the span core_span_id. |
| mcp_tool_call_name | string | The name of the requested tool given that mcp_method equal to tools/call. |
Note
Only the requests with the HTTP status code 200 are processed by the analytics server.
Copy .env.example to .env and customize it for your environment.
You need to specify the connection options to the InfluxDB instance using the environment variables:
| Variable | Description |
|---|---|
| INFLUX_URL | URL to the InfluxDB to write the analytics data |
| INFLUX_ORG | Name of the InfluxDB organization to write the analytics data |
| INFLUX_BUCKET | Name of the bucket to write the analytics data |
| INFLUX_API_TOKEN | InfluxDB API Token |
You can follow the InfluxDB 2 documentation to setup InfluxDB locally and acquire the required configuration parameters.
You need to specify the connection options to the InfluxDB instance using the environment variables:
| Variable | Description |
|---|---|
| INFLUX_URL | URL to the InfluxDB to write the analytics data |
| INFLUX_DATABASE | Name of the InfluxDB 3 database to write the analytics data |
| INFLUX_API_TOKEN | InfluxDB API Token with the write access to the target database |
You can follow the InfluxDB 3 documentation to setup InfluxDB locally and acquire the required configuration parameters.
Important
The INFLUX_DATABASE variable was introduced in version 0.22.0. For earlier versions set INFLUX_BUCKET variable to the target database name and INFLUX_ORG variable to any non-empty value (e.g. "ignored") to enable the InfluxDB 3 support.
This project includes optional aggregated Grafana dashboards that visualize 6-hours and monthly trends.
To enable these dashboards, you must manually create the required InfluxDB buckets and tasks. These steps are not automated via Helm and must be applied manually.
See influxdb/README.md for full instructions.
Important
Aggregated Dashboards are only supported for InfluxDB 2.
Also, following environment valuables can be used to configure the service behavior:
| Variable | Default | Description |
|---|---|---|
| MODEL_RATES | {} | Specifies per-token price rates for models in JSON format |
| TOPIC_MODEL | Specifies the name or path for the topic model. If the model is specified by name, it will be downloaded from the Huggingface. When unset or set to an empty string, the topic classification feature is disabled. | |
| TOPIC_EMBEDDINGS_MODEL | Specifies the name or path for the embeddings model used with the topic model. If the model is specified by name, it will be downloaded from the Huggingface. When unset or set to an empty string, the name will be used from the topic model config. | |
| LOG_LEVEL | INFO | The server logging level. Use DEBUG for dev purposes and INFO in prod |
Example of the MODEL_RATES configuration:
{
"gpt-4": {
"unit":"token",
"prompt_price":"0.00003",
"completion_price":"0.00006"
},
"gpt-35-turbo": {
"unit":"token",
"prompt_price":"0.0000015",
"completion_price":"0.000002"
},
"gpt-4-32k": {
"unit":"token",
"prompt_price":"0.00006",
"completion_price":"0.00012"
},
"text-embedding-ada-002": {
"unit":"token",
"prompt_price":"0.0000001"
},
"chat-bison@001": {
"unit":"char_without_whitespace",
"prompt_price":"0.0000005",
"completion_price":"0.0000005"
}
}This project requires Python ≥3.11 and Poetry ≥2.1.1 for dependency management.
-
Install Poetry. See the official installation guide.
-
(Optional) Specify custom Python or Poetry executables in
.env.dev. This is useful if multiple versions are installed. By default,pythonandpoetryare used.POETRY_PYTHON=path-to-python-exe POETRY=path-to-poetry-exe
-
Create and activate the virtual environment:
make init_env source .venv/bin/activate -
Install project dependencies (including linting, formatting, and test tools):
make install
To build the wheel packages run:
make buildTo run the development server locally run:
make serveThe server will be running as http://localhost:5001
To build the docker image run:
make docker_buildTo run the server locally from the docker image run:
make docker_serveThe server will be running as http://localhost:5001
Run the linting before committing:
make lintTo auto-fix formatting issues run:
make formatRun unit tests locally:
make testTo remove the virtual environment and build artifacts:
make clean