diff --git a/docs/ai-integration/ai-tasks-list-view.mdx b/docs/ai-integration/ai-tasks-list-view.mdx
index fbe09464b0..f6853ea921 100644
--- a/docs/ai-integration/ai-tasks-list-view.mdx
+++ b/docs/ai-integration/ai-tasks-list-view.mdx
@@ -26,6 +26,10 @@ import LanguageContent from "@site/src/components/LanguageContent";
* In the **AI Tasks - List view**, you can manage RavenDB's AI tasks -
create new tasks, edit existing ones, or delete them as needed.
+* To inspect errors raised by AI tasks and how those errors affect each task's health,
+ use the [AI Task Errors view](../monitoring/task-errors/studio-views.mdx#ai-task-errors-view).
+ See the [Task errors overview](../monitoring/task-errors/overview.mdx) for an introduction.
+
* In this article:
* [AI Tasks - list view](../ai-integration/ai-tasks-list-view.mdx#ai-tasks---list-view)
diff --git a/docs/ai-integration/gen-ai-integration/overview.mdx b/docs/ai-integration/gen-ai-integration/overview.mdx
index 4e9a825e44..88032e378c 100644
--- a/docs/ai-integration/gen-ai-integration/overview.mdx
+++ b/docs/ai-integration/gen-ai-integration/overview.mdx
@@ -33,6 +33,7 @@ import LanguageContent from "@site/src/components/LanguageContent";
* [How to create and run a GenAI task](../../ai-integration/gen-ai-integration/overview.mdx#how-to-create-and-run-a-genai-task)
* [Runtime](../../ai-integration/gen-ai-integration/overview.mdx#runtime)
* [Tracking of processed document parts](../../ai-integration/gen-ai-integration/overview.mdx#tracking-of-processed-document-parts)
+ * [Monitoring the tasks](../../ai-integration/gen-ai-integration/overview.mdx#monitoring-the-tasks)
* [Licensing](../../ai-integration/gen-ai-integration/overview.mdx#licensing)
* [Supported services](../../ai-integration/gen-ai-integration/overview.mdx#supported-services)
* [Common use cases](../../ai-integration/gen-ai-integration/overview.mdx#common-use-cases)
@@ -223,6 +224,24 @@ added or modified.
+## Monitoring the tasks
+
+* The status and state of each GenAI task are visible in the
+ [AI Tasks - list view](../../ai-integration/ai-tasks-list-view.mdx).
+
+* Task performance and activity over time can be analyzed in the _AI Tasks Stats_ view.
+ Learn more about the stats view in the
+ [Ongoing Tasks Stats](../../studio/database/stats/ongoing-tasks-stats/overview.mdx) article.
+
+* Errors raised by GenAI tasks, and how those errors affect each task's health, are tracked
+ in the [Task Errors view](../../monitoring/task-errors/studio-views.mdx#task-errors-view).
+ The [AI Task Errors view](../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
+ opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
+ For an introduction to task error monitoring, see the
+ [Task errors overview](../../monitoring/task-errors/overview.mdx).
+
+
+
## Licensing
For RavenDB to support the GenAI Integration feature, you need a `RavenDB AI` license type.
diff --git a/docs/ai-integration/generating-embeddings/content/_overview-csharp.mdx b/docs/ai-integration/generating-embeddings/content/_overview-csharp.mdx
index 220eee939e..a2bbd44c61 100644
--- a/docs/ai-integration/generating-embeddings/content/_overview-csharp.mdx
+++ b/docs/ai-integration/generating-embeddings/content/_overview-csharp.mdx
@@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.
+* Errors raised by embeddings generation tasks, and how those errors affect each task's
+ health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
+ The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
+ opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
+ For an introduction to task error monitoring, see the
+ [Task errors overview](../../../monitoring/task-errors/overview.mdx).
+
diff --git a/docs/ai-integration/generating-embeddings/content/_overview-nodejs.mdx b/docs/ai-integration/generating-embeddings/content/_overview-nodejs.mdx
index 3bc012b0fa..5e5ef046ba 100644
--- a/docs/ai-integration/generating-embeddings/content/_overview-nodejs.mdx
+++ b/docs/ai-integration/generating-embeddings/content/_overview-nodejs.mdx
@@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.
+* Errors raised by embeddings generation tasks, and how those errors affect each task's
+ health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
+ The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
+ opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
+ For an introduction to task error monitoring, see the
+ [Task errors overview](../../../monitoring/task-errors/overview.mdx).
+
diff --git a/docs/ai-integration/generating-embeddings/content/_overview-python.mdx b/docs/ai-integration/generating-embeddings/content/_overview-python.mdx
index 774fbe340e..ba18b1d5d1 100644
--- a/docs/ai-integration/generating-embeddings/content/_overview-python.mdx
+++ b/docs/ai-integration/generating-embeddings/content/_overview-python.mdx
@@ -130,6 +130,13 @@ import Panel from '@site/src/components/Panel';
* [5.1.11.25](../../../server/administration/snmp/snmp-overview.mdx#511125) – Total number of enabled embeddings generation tasks.
* [5.1.11.26](../../../server/administration/snmp/snmp-overview.mdx#511126) – Total number of active embeddings generation tasks.
+* Errors raised by embeddings generation tasks, and how those errors affect each task's
+ health, are tracked in the [Task Errors view](../../../monitoring/task-errors/studio-views.mdx#task-errors-view).
+ The [AI Task Errors view](../../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view),
+ opened from the `AI Hub`, shows the same errors pre-filtered to AI tasks only.
+ For an introduction to task error monitoring, see the
+ [Task errors overview](../../../monitoring/task-errors/overview.mdx).
+
diff --git a/docs/monitoring/_category_.json b/docs/monitoring/_category_.json
new file mode 100644
index 0000000000..aaab234002
--- /dev/null
+++ b/docs/monitoring/_category_.json
@@ -0,0 +1,4 @@
+{
+ "position": 1,
+ "label": "Monitoring"
+}
diff --git a/docs/monitoring/task-errors/_category_.json b/docs/monitoring/task-errors/_category_.json
new file mode 100644
index 0000000000..91be008210
--- /dev/null
+++ b/docs/monitoring/task-errors/_category_.json
@@ -0,0 +1,4 @@
+{
+ "position": 1,
+ "label": "Task Errors"
+}
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_ai-task-errors-view.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_ai-task-errors-view.snagx
new file mode 100644
index 0000000000..6742e37cc0
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_ai-task-errors-view.snagx differ
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_health-indicators.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_health-indicators.snagx
new file mode 100644
index 0000000000..0be0626a2b
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_health-indicators.snagx differ
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-bar-expanded.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-bar-expanded.snagx
new file mode 100644
index 0000000000..035eeb0a04
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-bar-expanded.snagx differ
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-view.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-view.snagx
new file mode 100644
index 0000000000..24ad557a34
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_ongoing-tasks-view.snagx differ
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_task-errors-view.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_task-errors-view.snagx
new file mode 100644
index 0000000000..b315b529fe
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_task-errors-view.snagx differ
diff --git a/docs/monitoring/task-errors/assets/snagit/task-errors_task-segment.snagx b/docs/monitoring/task-errors/assets/snagit/task-errors_task-segment.snagx
new file mode 100644
index 0000000000..5b0d2aa805
Binary files /dev/null and b/docs/monitoring/task-errors/assets/snagit/task-errors_task-segment.snagx differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_ai-task-errors-view.png b/docs/monitoring/task-errors/assets/task-errors_ai-task-errors-view.png
new file mode 100644
index 0000000000..8dbb1333ff
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_ai-task-errors-view.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_filter-bar.png b/docs/monitoring/task-errors/assets/task-errors_filter-bar.png
new file mode 100644
index 0000000000..0a2fdb6a40
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_filter-bar.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_health-indicators.png b/docs/monitoring/task-errors/assets/task-errors_health-indicators.png
new file mode 100644
index 0000000000..73eaafe314
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_health-indicators.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-bar-expanded.png b/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-bar-expanded.png
new file mode 100644
index 0000000000..72f3725070
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-bar-expanded.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-view.png b/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-view.png
new file mode 100644
index 0000000000..773543f6e0
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_ongoing-tasks-view.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_task-errors-view.png b/docs/monitoring/task-errors/assets/task-errors_task-errors-view.png
new file mode 100644
index 0000000000..e97a80d508
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_task-errors-view.png differ
diff --git a/docs/monitoring/task-errors/assets/task-errors_task-segment.png b/docs/monitoring/task-errors/assets/task-errors_task-segment.png
new file mode 100644
index 0000000000..d95843a552
Binary files /dev/null and b/docs/monitoring/task-errors/assets/task-errors_task-segment.png differ
diff --git a/docs/monitoring/task-errors/configuration.mdx b/docs/monitoring/task-errors/configuration.mdx
new file mode 100644
index 0000000000..14355d2d82
--- /dev/null
+++ b/docs/monitoring/task-errors/configuration.mdx
@@ -0,0 +1,140 @@
+---
+title: "Task errors: Configuration"
+sidebar_label: "Configuration options"
+description: "Configuration keys for task error monitoring."
+sidebar_position: 3
+---
+
+import Admonition from '@theme/Admonition';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+import CodeBlock from '@theme/CodeBlock';
+import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
+import LanguageContent from "@site/src/components/LanguageContent";
+import Panel from "@site/src/components/Panel";
+import ContentFrame from "@site/src/components/ContentFrame";
+
+# Task errors: Configuration
+
+
+
+* This page covers the configuration keys that control task error monitoring.
+
+* To learn how to apply these keys (where to set them, scope, syntax), see the
+ [Configuration Overview](../../server/configuration/configuration-options.mdx).
+
+* To learn about task errors and how task health is determined, see the
+ [Task errors overview](../../monitoring/task-errors/overview.mdx).
+
+* In this article:
+ * [Task health thresholds](../../monitoring/task-errors/configuration.mdx#task-health-thresholds)
+ * [ETL.ProcessHealthStatusImpairedThreshold](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusimpairedthreshold)
+ * [ETL.ProcessHealthStatusFailedThreshold](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusfailedthreshold)
+ * [Tuning the thresholds](../../monitoring/task-errors/configuration.mdx#tuning-the-thresholds)
+ * [Validation rules](../../monitoring/task-errors/configuration.mdx#validation-rules)
+
+
+
+
+
+Two configuration keys define the boundaries between the three task health states
+(`Healthy`, `Impaired`, and `Failed`). Each task is classified by its error ratio
+(described on the
+[Task errors overview](../../monitoring/task-errors/overview.mdx#how-health-is-computed)):
+`Healthy` below the Impaired threshold, `Impaired` between the two thresholds, and
+`Failed` above the Failed threshold. A task moves between states as the ratio crosses
+each threshold.
+
+Both keys can be set server-wide or per database, and both apply to AI tasks
+(Embeddings Generation, GenAI) as well as ETL tasks despite their `ETL.` prefix.
+
+
+
+### ETL.ProcessHealthStatusImpairedThreshold
+
+* Error-rate threshold above which a task's health is classified as `Impaired`.
+* A task whose recent error rate exceeds this value transitions from `Healthy` to `Impaired`.
+
+- **Type**: `float`
+- **Default**: `0.1`
+- **Range**: `[0, 1]`
+- **Scope**: Server-wide or per database
+
+
+
+---
+
+
+
+### ETL.ProcessHealthStatusFailedThreshold
+
+* Error-rate threshold above which a task's health is classified as `Failed`.
+* A task whose recent error rate exceeds this value transitions from `Impaired` to `Failed`.
+
+- **Type**: `float`
+- **Default**: `0.9`
+- **Range**: `[0, 1]`
+- **Scope**: Server-wide or per database
+
+
+
+---
+
+
+
+### Tuning the thresholds
+
+The defaults are tuned for typical workloads where most tasks should run cleanly and any
+sustained error rate is meaningful. Two situations commonly call for adjusting them:
+workloads that legitimately accept a high item-failure rate, and operational environments
+that need earlier escalation.
+
+A per-database setting always overrides the server-wide setting, so different workloads on
+the same server can use different sensitivity.
+
+#### Tuning the Impaired threshold
+
+The default of `0.1` is conservative. Even a small ratio of recent failures flips a task to
+`Impaired`, which makes sense when failures are expected to be rare and the goal is to flag
+a task as soon as it starts misbehaving.
+
+* Raise the threshold (for example to `0.2` or `0.3`) when the workload routinely produces
+ item errors that you do not want to escalate. A typical case is an ETL or AI task
+ processing user-generated data that often fails validation; the task is doing its job,
+ the failures are not actionable, and flipping to `Impaired` on every batch is noisy.
+
+* Lower the threshold (for example to `0.05`) when you want earlier alerting on tasks that
+ are starting to slip. The cost is more frequent `Impaired` classifications and the alerts
+ that ride on them.
+
+#### Tuning the Failed threshold
+
+The default of `0.9` is permissive. A task only flips to `Failed` when its recent error
+rate is overwhelming - effectively, when most of its recent batches have failed.
+
+* Raise the threshold (for example to `0.95`) when you want `Failed` to mean "essentially
+ broken" and tolerate substantial impairment without escalating. Useful when `Failed`
+ triggers automated responses that should be reserved for genuinely catastrophic states.
+
+* Lower the threshold (for example to `0.7`) when you want stronger and earlier escalation
+ on degraded tasks. The cost is more frequent `Failed` classifications and the automated
+ responses that ride on them.
+
+
+
+---
+
+
+
+### Validation rules
+
+RavenDB validates both keys at server startup. The server refuses to start if any of the
+following is violated:
+
+* Each threshold value must be between `0` and `1`, inclusive.
+* `ETL.ProcessHealthStatusFailedThreshold` must be strictly greater than
+ `ETL.ProcessHealthStatusImpairedThreshold`. Equal values are rejected.
+
+
+
+
diff --git a/docs/monitoring/task-errors/overview.mdx b/docs/monitoring/task-errors/overview.mdx
new file mode 100644
index 0000000000..791d7deb8a
--- /dev/null
+++ b/docs/monitoring/task-errors/overview.mdx
@@ -0,0 +1,257 @@
+---
+title: "Task errors: Overview"
+sidebar_label: "Overview"
+description: "Track errors raised by ETL and AI tasks, evaluate their impact on each task's health, and clear or retry as needed."
+sidebar_position: 1
+---
+
+import Admonition from '@theme/Admonition';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+import CodeBlock from '@theme/CodeBlock';
+import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
+import LanguageContent from "@site/src/components/LanguageContent";
+import Panel from "@site/src/components/Panel";
+import ContentFrame from "@site/src/components/ContentFrame";
+
+# Task errors: Overview
+
+
+
+* Task errors are raised and stored whenever an ETL task or an AI task fails to process an
+ item or a batch. Each error records the task name, the time of failure, the processing
+ step the error occurred at, and the error message.
+
+* Throughout this section, "AI tasks" means [Embeddings Generation](../../ai-integration/generating-embeddings/overview.mdx)
+ and [GenAI](../../ai-integration/gen-ai-integration/overview.mdx) tasks.
+
+* Errors are persisted on disk per task. Each task keeps its own error history and that history
+ survives moves between nodes and server restarts.
+
+* Each task also has a health classification - `Healthy`, `Impaired`, or `Failed` - that reflects
+ its recent error rate, independently of the raw number of errors stored.
+
+* Task errors and the health states they drive are exposed in
+ [Studio](../../monitoring/task-errors/studio-views.mdx),
+ [HTTP endpoints](../../server/troubleshooting/debug-routes.mdx#debug-endpoints),
+ [SNMP OIDs](../../server/administration/snmp/snmp-overview.mdx#list-of-oids),
+ [Prometheus metrics](../../server/administration/monitoring/prometheus.mdx#metrics-provided-by-the-prometheus-endpoint),
+ and [monitoring endpoints](../../server/administration/monitoring/telegraf.mdx#monitoring-endpoints).
+
+* In this article:
+ * [What task errors are](../../monitoring/task-errors/overview.mdx#what-task-errors-are)
+ * [Error types](../../monitoring/task-errors/overview.mdx#error-types)
+ * [Error steps](../../monitoring/task-errors/overview.mdx#error-steps)
+ * [Where task errors are stored](../../monitoring/task-errors/overview.mdx#where-task-errors-are-stored)
+ * [Task health](../../monitoring/task-errors/overview.mdx#task-health)
+ * [Health states](../../monitoring/task-errors/overview.mdx#health-states)
+ * [How health is computed](../../monitoring/task-errors/overview.mdx#how-health-is-computed)
+ * [Where to view and manage task errors](../../monitoring/task-errors/overview.mdx#where-to-view-and-manage-task-errors)
+
+
+
+
+
+Task errors are recorded for every ETL provider (RavenDB ETL, SQL, OLAP, ElasticSearch, Kafka,
+RabbitMQ, Azure Queue Storage, Amazon SQS, Snowflake) and for AI tasks (Embeddings Generation,
+GenAI). Whenever one of these tasks fails to process an item or a batch, an error is added to
+the task's error history.
+
+Every error carries the same set of core fields: the task name, the time the error was created,
+the processing step the error occurred at, and the error message. Different error types carry
+additional fields specific to what went wrong.
+
+
+
+### Error types
+
+RavenDB classifies every task error as one of two types, based on the scope of what went wrong.
+
+* **Item error**
+ An error that occurred while processing a single document. The document was skipped and the
+ task moved on to the remaining documents in the batch. The error record includes the
+ document ID.
+
+* **Process error**
+ An error that occurred while processing a batch as a whole and may affect multiple documents,
+ such as a failure to send the batch to its destination. The error record includes the number
+ of documents the failing batch attempted to handle.
+ After a process error, the task enters fallback mode and retries the batch periodically.
+
+
+
+---
+
+
+
+### Error steps
+
+Every error records the processing step it occurred at. The available steps depend on the task
+type.
+
+* **Configuration**
+ The task's configuration was rejected. Typical causes include an invalid script or a missing
+ destination setting.
+
+* **Extraction**
+ The task could not read its source data. This is rare and usually indicates a transient
+ storage issue.
+
+* **Transformation**
+ The transformation script raised an exception while running, such as an unhandled
+ JavaScript error or a reference to a missing property.
+
+* **Load**
+ The task could not send its transformed data to the destination. Typical causes include the
+ destination being unreachable or rejecting the data.
+
+* **Persistence**
+ The task could not save its results back to the database, or could not update its own
+ process state. Usually caused by storage errors.
+
+* **Model Inference** (AI tasks only)
+ The task could not communicate with the AI model. Typical causes include the model service
+ being unreachable or returning an error.
+
+* **Unknown**
+ The processing step could not be determined.
+
+
+
+
+
+
+
+* Each ETL or AI task keeps its errors in two dedicated tables on disk: one for item errors
+ and one for process errors.
+
+* Each table is capped at 500 entries per task. When a new error needs to be recorded
+ after the cap is reached, the oldest entry in that table is evicted to make room.
+ The cap is not configurable.
+
+* Retention is per task and per table, so a single noisy task cannot push errors out of
+ an unrelated task.
+
+* Task errors are also included in the server's debug package, with separate files for ETL
+ and AI task errors, so support engineers can capture a full error history without going
+ through Studio or the HTTP endpoints.
+
+
+
+
+
+Each ETL and AI task carries a health state that summarizes how well it has been processing
+recent batches. The health state is exposed everywhere task errors are
+(see [Where to view and manage task errors](../../monitoring/task-errors/overview.mdx#where-to-view-and-manage-task-errors))
+and is used by automated monitoring to decide when a task needs attention.
+
+
+
+### Health states
+
+A task is in one of three health states at any time.
+
+* **Healthy**
+ No errors recently, or only an occasional one. The task is processing batches normally.
+
+* **Impaired**
+ Errors are accumulating at a rate that warrants attention. The task is still making
+ progress, but it should be looked at.
+
+* **Failed**
+ Errors dominate recent batches. The task is effectively not progressing and needs
+ intervention.
+
+A task recovers automatically as new batches complete. The health state transitions from
+`Failed` back to `Impaired`, and from `Impaired` back to `Healthy`, as the running error rate
+falls below each threshold.
+
+Updating the task's configuration also resets the health state to `Healthy`.
+
+
+Deleting a task's stored errors clears the rows from the error tables but does not, on its own,
+reset the task's health state.
+Health is driven by the running error rate, not by the rows in the error tables. A task in
+the `Failed` state will recover only when its error rate falls back below the
+[configured thresholds](../../monitoring/task-errors/configuration.mdx).
+
+
+
+
+---
+
+
+
+### How health is computed
+
+RavenDB watches the ratio between a task's failed items and the total number of items the
+task has attempted to process. The ratio is computed as a time-independent EWMA
+(Exponentially Weighted Moving Average) - the weight of each batch decays as more batches
+complete, not as time passes - and is updated continuously as new batches complete.
+
+In plain terms, more recent batches weigh more in the calculation than older ones. A fresh
+string of failures pushes the ratio up faster than the raw error count would suggest, and a
+clean stretch of batches pulls it back down, again with the most recent batches having the
+strongest effect.
+
+The ratio is bounded between `0` and `1`, where `0` means no recent failures and `1` means
+recent batches have all failed. Two thresholds determine the transitions between states:
+
+* The task is classified as `Impaired` when the ratio exceeds
+ `ETL.ProcessHealthStatusImpairedThreshold` (default `0.1`).
+* The task is classified as `Failed` when the ratio exceeds
+ `ETL.ProcessHealthStatusFailedThreshold` (default `0.9`).
+
+Both thresholds are configurable, server-wide or per database, and apply to AI tasks as well
+as ETL tasks despite the keys `ETL` prefix.
+
+[Task errors configuration](../../monitoring/task-errors/configuration.mdx) covers the two
+keys, their valid ranges, and guidance for choosing values.
+
+
+
+
+
+
+
+Task errors and the resulting health states are exposed in several places. Most users will
+start with Studio; automated monitoring tools usually pull from SNMP OIDs, Prometheus
+metrics, or monitoring endpoints.
+
+[Inspect and manage task errors via the HTTP endpoints](../../server/troubleshooting/debug-routes.mdx)
+[Inspect and manage task errors via Studio](../../monitoring/task-errors/studio-views.mdx)
+
+Where to find them in detail:
+
+* **HTTP endpoints**
+ * `GET /databases/*/tasks/errors` returns errors across all ETL and AI tasks.
+ * `GET /databases/*/etl/errors` and `GET /databases/*/ai/errors` return errors per category.
+ * `DELETE` variants of each path remove errors in bulk, optionally filtered by task name or
+ category. For example, `DELETE /databases/*/etl/errors?name=` clears the
+ errors of one specific ETL task.
+ * `POST /databases/*/etl/retry-batch` forces an immediate retry of an ETL task currently in
+ fallback mode.
+
+ See [Debug Endpoints](../../server/troubleshooting/debug-routes.mdx#debug-endpoints) for the full reference.
+
+* **Studio views**
+ The `Task Errors` view is reachable from `Tasks` **>** `Task Errors` and from
+ `AI Hub` **>** `AI Task Errors` (the same view, pre-filtered to AI tasks).
+ Each ETL and AI task bar on the `Ongoing Tasks` view also shows the task's health state and error count.
+ See [Task errors Studio views](../../monitoring/task-errors/studio-views.mdx).
+
+* **SNMP OIDs**
+ Dedicated OIDs for server-level, database-level, and per-task error counts and health
+ states.
+ See [List of OIDs](../../server/administration/snmp/snmp-overview.mdx#list-of-oids).
+
+* **Prometheus metrics**
+ Metrics for server, database, and per-task scopes, mirroring the SNMP set.
+ See [Prometheus integration](../../server/administration/monitoring/prometheus.mdx).
+
+* **Monitoring Endpoints**
+ `/admin/monitoring/v1/etls` and `/admin/monitoring/v1/ai-tasks` return per-task health and
+ error counts as JSON.
+ See [Monitoring endpoints](../../server/administration/monitoring/telegraf.mdx#monitoring-endpoints).
+
+
diff --git a/docs/monitoring/task-errors/studio-views.mdx b/docs/monitoring/task-errors/studio-views.mdx
new file mode 100644
index 0000000000..f235b2c219
--- /dev/null
+++ b/docs/monitoring/task-errors/studio-views.mdx
@@ -0,0 +1,282 @@
+---
+title: "Task errors: Studio views"
+sidebar_label: "Studio views"
+description: "Inspect task errors and health states for ETL and AI tasks from the Task Errors view in Studio."
+sidebar_position: 2
+---
+
+import Admonition from '@theme/Admonition';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+import CodeBlock from '@theme/CodeBlock';
+import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
+import LanguageContent from "@site/src/components/LanguageContent";
+import Panel from "@site/src/components/Panel";
+import ContentFrame from "@site/src/components/ContentFrame";
+
+# Task errors: Studio views
+
+
+
+* **`Tasks` > `Task Errors`**
+ Open the `Task Errors` view from the `Tasks` menu to inspect errors raised by ETL and AI tasks.
+ You can browse all errors in a unified list or group them by task, apply various filters,
+ select an error to view it in detail, and see how task health is impacted by recent errors.
+
+* **`AI Hub` > `AI Task Errors`**
+ The `AI Task Errors` view, opened from the `AI Hub`, is a pre-filtered subset of the `Task Errors` view.
+ Use this view to inspect errors raised by `Embeddings Generation` and `GenAI` tasks.
+
+* Both views display the same errors for listed tasks; deleting a task's errors from one view is reflected
+ in the other.
+
+* **`Tasks` > `Ongoing Tasks`**
+ Each ETL and AI task bar on the `Ongoing Tasks` view shows the task's health state
+ and error count; expanding the bar reveals additional detail.
+
+* To learn about task errors and how they impact task health, see the
+ [Overview](../../monitoring/task-errors/overview.mdx) page.
+
+* In this article:
+ * [Task Errors view](../../monitoring/task-errors/studio-views.mdx#task-errors-view)
+ * [Opening the view](../../monitoring/task-errors/studio-views.mdx#opening-the-view)
+ * [Task filters](../../monitoring/task-errors/studio-views.mdx#task-filters)
+ * [Task health indicators](../../monitoring/task-errors/studio-views.mdx#task-health-indicators)
+ * [Task errors](../../monitoring/task-errors/studio-views.mdx#task-errors)
+ * [AI Task Errors view](../../monitoring/task-errors/studio-views.mdx#ai-task-errors-view)
+ * [Task health on the Ongoing Tasks view](../../monitoring/task-errors/studio-views.mdx#task-health-on-the-ongoing-tasks-view)
+ * [Collapsed view](../../monitoring/task-errors/studio-views.mdx#collapsed-view)
+ * [Expanded view](../../monitoring/task-errors/studio-views.mdx#expanded-view)
+
+
+
+
+
+In its default layout, `Task Errors` groups errors into per-task segments, each showing the
+task's errors in a sortable table.
+
+
+
+### Opening the view
+
+Open the `Task Errors` view from the `Tasks` menu. By default it will open with no filters applied,
+showing a segment for every ETL or AI task that currently has any errors.
+
+
+
+* **A.** Click to open the Tasks menu.
+
+* **B.** Click to open the Task Errors view.
+
+* **C.** [Task filters](../../monitoring/task-errors/studio-views.mdx#task-filters) (see below).
+
+* **D.** [Task health indicators](../../monitoring/task-errors/studio-views.mdx#task-health-indicators) (see below).
+
+* **E.** Toggle to **group errors by task** or display them in a unified list.
+
+* **F.** [Task errors](../../monitoring/task-errors/studio-views.mdx#task-errors) (see below).
+
+
+
+---
+
+
+
+### Task filters
+
+Use the filters bar to narrow the listing to specific tasks and errors.
+
+
+
+* **`Filter by task/script name`**
+ Type a task or script name to narrow the listing to matching tasks.
+
+* **`Filter by node`**
+ Pick one or more cluster nodes to show only the errors raised on the selected nodes.
+
+* **`Filter by task type`**
+ Pick one or more task types (e.g., Kafka ETL) to show only the errors raised by the selected types.
+
+* **`Filter by task health`**
+ Pick one or more health states to show only tasks currently in the selected states.
+
+
+
+---
+
+
+
+### Task health indicators
+
+The indicators' colors represent task health states: Green for `Healthy`, yellow for
+`Impaired`, and red for `Failed`.
+* Hover an indicator to trigger a popup summary of tasks whose health currently matches
+ the selected state.
+* The summary lists only the node currently running the task and any nodes that recorded
+ errors for it, with the error count per node.
+
+
+
+
+
+---
+
+
+
+### Task errors
+
+The image below shows one of the task segments displayed in the task errors view when errors
+are grouped by task.
+
+
+
+1. **Task name**
+ The name of the ETL or AI task whose errors are displayed here.
+
+2. **Delete errors**
+ Click to **remove all errors raised by this task**, including both item and process errors.
+
+ Deleting a task's errors does not, on its own, reset the task's health state.
+ Health is driven by the running error rate, not by the rows in the error tables.
+ A task in `Impaired` or `Failed` state will recover only as new batches complete
+ successfully and its error rate falls back below the configured thresholds.
+ See the [Overview](../../monitoring/task-errors/overview.mdx#health-states)
+ for more.
+
+
+3. **Task metadata row**
+ * A toggle to collapse or expand all errors related to this task.
+ * Task type.
+ * Error count for this task.
+ * The number of scripts that this task runs.
+ * Task's current health state (`Healthy`, `Impaired`, or `Failed`).
+ * Tag/s of the cluster node/s currently running the task.
+
+4. **Script sub-segment details**
+ Errors for each script the task runs appear in their own sub-segment, with a header showing
+ the script's name and error count and a toggle to collapse or expand the errors related to this script.
+
+5. **Errors table**
+ The script's errors, one row per error.
+
+ * **Column headers**
+ You can filter or sort the table by the content of each column, using the
+ funnel (filter) or arrow (sort) icons at the column header.
+
+ * **`Show` column**
+ Click the eye icon for a specific error to open an error-details dialog with the full error
+ message.
+
+ * **`Error type` column**
+ Marks the row as `Item Error` (a single document failure the task skipped past) or
+ `Process Error` (a batch-scope failure that may affect multiple documents).
+
+ * **`Error step` column**
+ The processing step the error occurred at: `Configuration`, `Extraction`,
+ `Transformation`, `Load`, `Persistence`, `Model Inference`, or `Unknown`.
+ See the [Error steps](../../monitoring/task-errors/overview.mdx#error-steps) reference on the
+ overview.
+
+ * **`Document` column**
+ For item errors, the ID of the document being processed when the error occurred,
+ rendered as a hyperlink to the document.
+ For process errors, the column shows `-` because the error is not bound to a single document.
+
+ * **`Date` column**
+ The error's creation timestamp, shown in date form and in relation to the current time
+ (e.g., "4 hours ago").
+
+ * **`Affected Documents` column**
+ For process errors, the number of documents the failing batch attempted to process.
+ Empty for item errors.
+
+ * **`Error` column**
+ The error message, truncated to one line.
+
+ * **`Node` column**
+ The tag of the cluster node that recorded the error.
+
+
+
+
+
+
+
+The `AI Task Errors` view lists the same errors listed by the `Task Errors` view,
+with the same layout, controls, and data, but applies a predefined filter to show only
+`Embeddings Generation` and `GenAI` task errors.
+
+All options documented under
+[Task Errors view](../../monitoring/task-errors/studio-views.mdx#task-errors-view)
+above apply here without change.
+
+
+
+1. Click to open the **AI Hub**.
+
+2. Click to open the AI Task Errors view.
+
+
+
+
+
+On the `Ongoing Tasks` view, each ETL or AI task bar displays the task's current health
+state and the number of errors recorded for the task. Expanding the bar reveals these
+details per node, along with the node's `Connection status`.
+
+
+
+### Collapsed view
+
+
+
+1. **Click to open the `Tasks` menu.**
+
+2. **Click to open the `Ongoing Tasks` view.**
+
+3. **Task bar**
+
+4. **Expand details**
+ Click to expand the bar - see the per-node breakdown below.
+
+5. **Task health and error count**
+ * `Health Status` - the task's current state (`Healthy`, `Impaired`, or `Failed`).
+ * `Errors` - the number of errors currently recorded for the task.
+
+
+
+---
+
+
+
+### Expanded view
+
+
+
+Each relevant node has its own column showing how the task is doing on that node. Only
+the node currently running the task, and any other nodes that recorded errors for it,
+are shown.
+
+* `Connection status` - the state of the node's connection to the task's destination.
+ The value is `Active` while the connection is up, and `Reconnect` after a failure
+ while the task waits to retry.
+ To retry the failing batch immediately, hover `Reconnect` and click the **Retry now**
+ button that appears.
+
+* `Errors` - the number of errors the task has raised on this node.
+
+* `Health status` - the task's classification on this node (`Healthy`, `Impaired`,
+ or `Failed`).
+
+* `State` - the task's processing state on this node (such as `UP TO DATE` or
+ `0% RUNNING`).
+
+
+
+
+
+See the [Ongoing Tasks - Overview](../../studio/database/tasks/ongoing-tasks/general-info.mdx#the-ongoing-tasks-list)
+page for a full walkthrough of the view, including filters, selection, and per-task
+actions.
+
+
diff --git a/docs/server/administration/monitoring/prometheus.mdx b/docs/server/administration/monitoring/prometheus.mdx
index 0092dfcee1..79b56674dc 100644
--- a/docs/server/administration/monitoring/prometheus.mdx
+++ b/docs/server/administration/monitoring/prometheus.mdx
@@ -64,7 +64,7 @@ or to `false` to include it.
`skipCollectionsMetrics`
E.g., to skip indexing metrics use -
-http://localhost:8080/admin/monitoring/v1/prometheus?skipIndexesMetrics=true
+http://localhost:8080/admin/monitoring/v1/prometheus?skipIndexesMetrics=true
And to skip both indexing and server metrics use -
http://localhost:8080/admin/monitoring/v1/prometheus?skipIndexesMetrics=true&skipServerMetrics=true
@@ -74,6 +74,10 @@ Here is the list of metrics made available by the `/admin/monitoring/v1/promethe
| Metrics | Description |
| - | - |
+| ai_task_documents_processed_per_second | Documents processed per second by the AI task (one minute rate) |
+| ai_task_errors_count | Number of errors recorded for the AI task |
+| ai_task_health_status | AI task health status + `0`/`1`/`2`
0 => Healthy
1 => Impaired
2 => Failed |
+| ai_task_last_successful_batch_time_in_seconds | Time since the AI task's last successful batch, in seconds |
| archived_data_processing_behavior | Archived data processing behavior + `0`/`1`/`2`
0 => ExcludeArchived
1 => IncludeArchived
2 => ArchivedOnly |
| backup_current_number_of_running_backups | Number of currently running backups |
| backup_max_number_of_concurrent_backups | Maximum number of concurrent backups |
@@ -93,9 +97,19 @@ Here is the list of metrics made available by the `/admin/monitoring/v1/promethe
| cpu_processor_count | Number of processors on the machine |
| cpu_thread_pool_available_completion_port_threads | Number of available completion port threads in the thread pool |
| cpu_thread_pool_available_worker_threads | Number of available worker threads in the thread pool |
+| database_ai_tasks_count | Number of AI tasks in the database |
+| database_ai_tasks_errors_count | Total number of AI task errors in the database |
+| database_ai_tasks_failed_count | Number of AI tasks with `Failed` health status in the database |
+| database_ai_tasks_healthy_count | Number of AI tasks with `Healthy` health status in the database |
+| database_ai_tasks_impaired_count | Number of AI tasks with `Impaired` health status in the database |
| database_alerts_count | Number of alerts |
| database_attachments_count | Number of attachments |
| database_documents_count | Number of documents |
+| database_etls_count | Number of ETL tasks in the database |
+| database_etls_errors_count | Total number of ETL errors in the database |
+| database_etls_failed_count | Number of ETL tasks with `Failed` health status in the database |
+| database_etls_healthy_count | Number of ETL tasks with `Healthy` health status in the database |
+| database_etls_impaired_count | Number of ETL tasks with `Impaired` health status in the database |
| database_indexes_auto_count | Number of auto indexes |
| database_indexes_count | Number of indexes |
| database_indexes_errored_count | Number of error indexes |
@@ -131,6 +145,10 @@ Here is the list of metrics made available by the `/admin/monitoring/v1/promethe
| database_uptime_seconds | Database up-time |
| databases_loaded_count | Number of loaded databases |
| databases_total_count | Number of all databases |
+| etl_documents_processed_per_second | Documents processed per second by the ETL task (one minute rate) |
+| etl_errors_count | Number of errors recorded for the ETL task |
+| etl_health_status | ETL task health status + `0`/`1`/`2`
0 => Healthy
1 => Impaired
2 => Failed |
+| etl_last_successful_batch_time_in_seconds | Time since the ETL task's last successful batch, in seconds |
| index_entries_count | Number of entries in the index |
| index_errors | Number of index errors |
| index_is_invalid | Indicates if index is invalid |
@@ -161,9 +179,19 @@ Here is the list of metrics made available by the `/admin/monitoring/v1/promethe
| network_requests_per_second | Number of requests per second (one minute rate) |
| network_tcp_active_connections | Number of active TCP connections |
| network_total_requests | Total number of requests since server startup |
+| server_ai_tasks_count | Total number of AI tasks across all databases |
+| server_ai_tasks_errors_count | Total number of AI task errors across all databases |
+| server_ai_tasks_failed_count | Number of AI tasks with `Failed` health status across all databases |
+| server_ai_tasks_healthy_count | Number of AI tasks with `Healthy` health status across all databases |
+| server_ai_tasks_impaired_count | Number of AI tasks with `Impaired` health status across all databases |
| server_disk_remaining_storage_space_percentage | Remaining server storage disk space in % |
| server_disk_system_store_total_data_file_size_bytes | Server storage total size |
| server_disk_system_store_used_data_file_size_bytes | Server storage used size |
+| server_etls_count | Total number of ETL tasks across all databases |
+| server_etls_errors_count | Total number of ETL errors across all databases |
+| server_etls_failed_count | Number of ETL tasks with `Failed` health status across all databases |
+| server_etls_healthy_count | Number of ETL tasks with `Healthy` health status across all databases |
+| server_etls_impaired_count | Number of ETL tasks with `Impaired` health status across all databases |
| server_info | Server Info |
| server_process_id | Server process ID |
| server_storage_io_read_operations | Disk IO Read operations |
diff --git a/docs/server/administration/monitoring/telegraf.mdx b/docs/server/administration/monitoring/telegraf.mdx
index c224654793..fb18eac35f 100644
--- a/docs/server/administration/monitoring/telegraf.mdx
+++ b/docs/server/administration/monitoring/telegraf.mdx
@@ -39,12 +39,14 @@ data tracking dashboard. But this feature is flexible - Telegraf can output data
## Monitoring Endpoints
-The monitoring endpoints output data in JSON format. There are four endpoints:
+The monitoring endpoints output data in JSON format. There are six endpoints:
* `/admin/monitoring/v1/server`
* `/admin/monitoring/v1/databases`
* `/admin/monitoring/v1/indexes`
* `/admin/monitoring/v1/collections`
+* `/admin/monitoring/v1/etls`
+* `/admin/monitoring/v1/ai-tasks`
## JSON Fields Returned by the Endpoints
@@ -52,6 +54,11 @@ The following is a list of JSON fields returned by the endpoints:
| Endpoint Suffix | Field Name | Description |
| - | - | - |
+| `ai-tasks` | `process_name` | The AI task name |
+| `ai-tasks` | `errors_count` | Number of errors recorded for the AI task |
+| `ai-tasks` | `health_status` | AI task health status (`Healthy`, `Impaired`, or `Failed`) |
+| `ai-tasks` | `last_successful_batch_time_in_sec` | Time since the AI task's last successful batch, in seconds |
+| `ai-tasks` | `documents_processed_per_second` | Documents processed per second by the AI task (one minute rate) |
| `collections` | `collection_name` | Collection name |
| `collections` | `database_name` | Name of this collection's database |
| `collections` | `documents_count` | Number of documents in collection |
@@ -96,6 +103,11 @@ The following is a list of JSON fields returned by the endpoints:
| `databases` | `storage_queue_length` | Storage queue length
Optional, Linux only |
| `databases` | `time_since_last_backup_in_sec` | LastBackup |
| `databases` | `uptime_in_sec` | Database up-time |
+| `etls` | `process_name` | The ETL task name |
+| `etls` | `errors_count` | Number of errors recorded for the ETL task |
+| `etls` | `health_status` | ETL task health status (`Healthy`, `Impaired`, or `Failed`) |
+| `etls` | `last_successful_batch_time_in_sec` | Time since the ETL task's last successful batch, in seconds |
+| `etls` | `documents_processed_per_second` | Documents processed per second by the ETL task (one minute rate) |
| `indexes` | `entries_count` | Number of entries in the index |
| `indexes` | `errors` | Number of index errors |
| `indexes` | `index_name` | Index name |
diff --git a/docs/server/administration/snmp/snmp-overview.mdx b/docs/server/administration/snmp/snmp-overview.mdx
index 60ddff546b..e43e15010c 100644
--- a/docs/server/administration/snmp/snmp-overview.mdx
+++ b/docs/server/administration/snmp/snmp-overview.mdx
@@ -38,6 +38,8 @@ SNMP support is available for [Enterprise](../../../licensing/overview.mdx#enter
* [Index OIDs](../../../server/administration/snmp/snmp-overview.mdx#index-oids)
* [General OIDs](../../../server/administration/snmp/snmp-overview.mdx#general-oids)
* [Ongoing tasks OIDs](../../../server/administration/snmp/snmp-overview.mdx#ongoing-tasks-oids)
+ * [Per-task ETL OIDs](../../../server/administration/snmp/snmp-overview.mdx#per-task-etl-oids)
+ * [Per-task AI OIDs](../../../server/administration/snmp/snmp-overview.mdx#per-task-ai-oids)
## Overview
@@ -268,6 +270,7 @@ curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1
`3` - **a background collection** (this is always a generation 2 collection)
* `D` - **Database number**
* `I` - **Index number**
+ * `T` - **Task number** (used for per-task ETL and per-task AI OIDs)
@@ -361,6 +364,18 @@ curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1
| 1.17.2 | Number of current map files in '/proc/self/maps' |
| 1.17.3 | Value of the '/proc/sys/kernel/threads-max' parameter |
| 1.17.4 | Number of current threads |
+| 1.20.1 | Total number of ETL errors |
+| 1.20.2 | Number of ETL tasks with `Healthy` health status |
+| 1.20.3 | Number of ETL tasks with `Impaired` health status |
+| 1.20.4 | Number of ETL tasks with `Failed` health status |
+| 1.20.5 | Total number of ETL tasks |
+| 1.20.6 | Number of active ETL tasks (processed at least one batch in the last minute) |
+| 1.21.1 | Total number of AI task errors |
+| 1.21.2 | Number of AI tasks with `Healthy` health status |
+| 1.21.3 | Number of AI tasks with `Impaired` health status |
+| 1.21.4 | Number of AI tasks with `Failed` health status |
+| 1.21.5 | Total number of AI tasks |
+| 1.21.6 | Number of active AI tasks (processed at least one batch in the last minute) |
@@ -390,6 +405,8 @@ curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1
| 5.2.`D`.1.14 | Number of rehabs |
| 5.2.`D`.1.15 | Number of performance hints |
| 5.2.`D`.1.16 | Number of indexing errors |
+| 5.2.`D`.1.17 | Total number of ETL errors in the database |
+| 5.2.`D`.1.18 | Total number of AI task errors in the database |
| 5.2.`D`.2.1 | Documents storage allocated size in MB |
| 5.2.`D`.2.2 | Documents storage used size in MB |
| 5.2.`D`.2.3 | Index storage allocated size in MB |
@@ -417,6 +434,18 @@ curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1
| 5.2.`D`.5.7 | Number of faulty indexes |
| 5.2.`D`.6.1 | Number of writes (documents, attachments, counters, timeseries) |
| 5.2.`D`.6.2 | Number of bytes written (documents, attachments, counters, timeseries) |
+| 5.2.`D`.7.1 | Number of ETL tasks with `Healthy` health status in the database |
+| 5.2.`D`.7.2 | Number of ETL tasks with `Impaired` health status in the database |
+| 5.2.`D`.7.3 | Number of ETL tasks with `Failed` health status in the database |
+| 5.2.`D`.7.4 | Total number of ETL tasks in the database |
+| 5.2.`D`.7.5 | Number of active ETL tasks in the database |
+| 5.2.`D`.7.6 | ETL documents processed per second in the database (one minute rate) |
+| 5.2.`D`.8.1 | Number of AI tasks with `Healthy` health status in the database |
+| 5.2.`D`.8.2 | Number of AI tasks with `Impaired` health status in the database |
+| 5.2.`D`.8.3 | Number of AI tasks with `Failed` health status in the database |
+| 5.2.`D`.8.4 | Total number of AI tasks in the database |
+| 5.2.`D`.8.5 | Number of active AI tasks in the database |
+| 5.2.`D`.8.6 | AI task documents processed per second in the database (one minute rate) |
@@ -490,6 +519,27 @@ curl -X GET http://live-test.ravendb.net/monitoring/snmp?oid=1.3.6.1.4.1.45751.1
| 5.1.11.24 | Number of active Snowflake ETL tasks for all databases |
| 5.1.11.25 | Number of enabled Embeddings Generation tasks for all databases |
| 5.1.11.26 | Number of active Embeddings Generation tasks for all databases |
-
+| 5.1.12.1 | Total ETL documents processed per second across all databases (one minute rate) |
+| 5.1.12.2 | Total AI task documents processed per second across all databases (one minute rate) |
+
+
+
+| OID | Metric (Per-task ETL) |
+|------------------------------------------------------|----------------------------------------------------------------------|
+| 5.2.`D`.1.`T`.1 | Number of errors for the ETL task |
+| 5.2.`D`.1.`T`.2 | Health status of the ETL task (`Healthy`, `Impaired`, or `Failed`) |
+| 5.2.`D`.1.`T`.3 | Time of the last successful batch processed by the ETL task |
+| 5.2.`D`.1.`T`.4 | Documents processed per second by the ETL task (one minute rate) |
+| 5.2.`D`.1.`T`.5 | Responsible node tag for the ETL task |
+
+
+
+| OID | Metric (Per-task AI) |
+|------------------------------------------------------|----------------------------------------------------------------------|
+| 5.2.`D`.2.`T`.1 | Number of errors for the AI task |
+| 5.2.`D`.2.`T`.2 | Health status of the AI task (`Healthy`, `Impaired`, or `Failed`) |
+| 5.2.`D`.2.`T`.3 | Time of the last successful batch processed by the AI task |
+| 5.2.`D`.2.`T`.4 | Documents processed per second by the AI task (one minute rate) |
+| 5.2.`D`.2.`T`.5 | Responsible node tag for the AI task |
diff --git a/docs/server/configuration/etl-configuration.mdx b/docs/server/configuration/etl-configuration.mdx
index 9434422ea9..9b67801804 100644
--- a/docs/server/configuration/etl-configuration.mdx
+++ b/docs/server/configuration/etl-configuration.mdx
@@ -23,6 +23,8 @@ import Panel from '@site/src/components/Panel';
* [ETL.MaxNumberOfExtractedDocuments](../../server/configuration/etl-configuration.mdx#etlmaxnumberofextracteddocuments)
* [ETL.MaxNumberOfExtractedItems](../../server/configuration/etl-configuration.mdx#etlmaxnumberofextracteditems)
* [ETL.OLAP.MaxNumberOfExtractedDocuments](../../server/configuration/etl-configuration.mdx#etlolapmaxnumberofextracteddocuments)
+ * [ETL.ProcessHealthStatusFailedThreshold](../../server/configuration/etl-configuration.mdx#etlprocesshealthstatusfailedthreshold)
+ * [ETL.ProcessHealthStatusImpairedThreshold](../../server/configuration/etl-configuration.mdx#etlprocesshealthstatusimpairedthreshold)
* [ETL.Queue.AzureQueueStorage.TimeToLiveInSec](../../server/configuration/etl-configuration.mdx#etlqueueazurequeuestoragetimetoliveinsec)
* [ETL.Queue.AzureQueueStorage.VisibilityTimeoutInSec](../../server/configuration/etl-configuration.mdx#etlqueueazurequeuestoragevisibilitytimeoutinsec)
* [ETL.Queue.Kafka.InitTransactionsTimeoutInSec](../../server/configuration/etl-configuration.mdx#etlqueuekafkainittransactionstimeoutinsec)
@@ -112,6 +114,36 @@ Max number of extracted documents in OLAP ETL batch.
+## ETL.ProcessHealthStatusFailedThreshold
+
+* Error-rate threshold for the `Failed` task health state. A task whose recent error rate
+ exceeds this value is classified as `Failed`.
+* See the [Task errors](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusfailedthreshold)
+ page to learn how the rate is calculated and how to set a value.
+
+- **Type**: `float`
+- **Default**: `0.9`
+- **Range**: `[0, 1]`
+- **Scope**: Server-wide or per database
+
+
+
+
+## ETL.ProcessHealthStatusImpairedThreshold
+
+* Error-rate threshold for the `Impaired` task health state. A task whose recent error rate
+ exceeds this value is classified as `Impaired`.
+* See the [Task errors](../../monitoring/task-errors/configuration.mdx#etlprocesshealthstatusimpairedthreshold)
+ page to learn how the rate is calculated and how to set a value.
+
+- **Type**: `float`
+- **Default**: `0.1`
+- **Range**: `[0, 1]`
+- **Scope**: Server-wide or per database
+
+
+
+
## ETL.Queue.AzureQueueStorage.TimeToLiveInSec
Lifespan of a message in the queue.
diff --git a/docs/server/troubleshooting/debug-routes.mdx b/docs/server/troubleshooting/debug-routes.mdx
index f20e90c9c5..713eeb1b83 100644
--- a/docs/server/troubleshooting/debug-routes.mdx
+++ b/docs/server/troubleshooting/debug-routes.mdx
@@ -47,6 +47,7 @@ For the endpoints that begin with `/databases/*/`, replace `*` with the name of
| /build/version | GET | | Returns product build number, major version, commit hash and full version number | |
| /databases/*/admin/debug/cluster/txinfo | GET | - `from` (Optional)
Number of results to skip - `take` (Optional)
Number of results to take
| List the incomplete [cluster transaction commands](../clustering/cluster-transactions.mdx#cluster--cluster-wide-transactions) | |
| /databases/*/admin/debug/txinfo | GET | | List | |
+| /databases/*/ai/errors | GET | - `name` (Optional, multi-valued)
Filter results to errors of the named AI task or tasks.
| List recent errors recorded for AI tasks (Embeddings Generation, GenAI). See [Task errors overview](../../monitoring/task-errors/overview.mdx). | |
| /databases/*/debug/documents/huge | GET | | List IDs of documents which exceed `PerformanceHints.`
`Documents.`
`HugeDocumentSizeInMb` setting | |
| /databases/*/debug/identities | GET | | | |
| /databases/*/debug/info-package | GET | | Save debug package information for later analysis | |
@@ -57,6 +58,7 @@ For the endpoints that begin with `/databases/*/`, replace `*` with the name of
| /databases/*/debug/script-runners | GET | | | |
| /databases/*/debug/storage/all-environments/report | GET | | | |
| /databases/*/debug/storage/report | GET | | | |
+| /databases/*/etl/errors | GET | - `name` (Optional, multi-valued)
Filter results to errors of the named ETL task or tasks.
| List recent errors recorded for ETL tasks (RavenDB, SQL, OLAP, ElasticSearch, Kafka, RabbitMQ, Azure Queue Storage, Amazon SQS, Snowflake). See [Task errors overview](../../monitoring/task-errors/overview.mdx). | |
| /databases/*/indexes | GET | | | |
| /databases/*/indexes/errors | GET | | | |
| /databases/*/indexes/stats | GET | | | |
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-0.snagx b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-0.snagx
new file mode 100644
index 0000000000..4a16d403ff
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-0.snagx differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-1.snagx b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-1.snagx
new file mode 100644
index 0000000000..18f72ead14
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-1.snagx differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-2.snagx b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-2.snagx
new file mode 100644
index 0000000000..ac054446dc
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list-2.snagx differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_actions.snagx b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_actions.snagx
new file mode 100644
index 0000000000..a9c9b43bea
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_actions.snagx differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_info.snagx b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_info.snagx
new file mode 100644
index 0000000000..772e592199
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/snagit/task-list_task-bar_info.snagx differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-0.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-0.png
new file mode 100644
index 0000000000..e0ad562492
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-0.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-1.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-1.png
index 00f120d0be..365dfddefd 100644
Binary files a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-1.png and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-1.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-2.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-2.png
index 08d858233c..f771bfdfc9 100644
Binary files a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-2.png and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-2.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-3.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list-3.png
deleted file mode 100644
index 8aaedb49e3..0000000000
Binary files a/docs/studio/database/tasks/ongoing-tasks/assets/task-list-3.png and /dev/null differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list_action-bar.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_action-bar.png
new file mode 100644
index 0000000000..51d7dfd3c6
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_action-bar.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_actions.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_actions.png
new file mode 100644
index 0000000000..d3a7e9d9f8
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_actions.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_info.png b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_info.png
new file mode 100644
index 0000000000..c756b4f4d0
Binary files /dev/null and b/docs/studio/database/tasks/ongoing-tasks/assets/task-list_task-bar_info.png differ
diff --git a/docs/studio/database/tasks/ongoing-tasks/general-info.mdx b/docs/studio/database/tasks/ongoing-tasks/general-info.mdx
index dbf289abf7..a22dbd6d4b 100644
--- a/docs/studio/database/tasks/ongoing-tasks/general-info.mdx
+++ b/docs/studio/database/tasks/ongoing-tasks/general-info.mdx
@@ -11,6 +11,8 @@ import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
import LanguageSwitcher from "@site/src/components/LanguageSwitcher";
import LanguageContent from "@site/src/components/LanguageContent";
+import Panel from "@site/src/components/Panel";
+import ContentFrame from "@site/src/components/ContentFrame";
# Ongoing Tasks - Overview
@@ -19,40 +21,74 @@ import LanguageContent from "@site/src/components/LanguageContent";
* Each task is assigned a responsible node from the [Database Group nodes](../../../../studio/database/settings/manage-database-group.mdx) to handle the work.
* If not specified by the user, the cluster decides which node will be responsible for the task. See [Members Duties](../../../../studio/database/settings/manage-database-group.mdx#database-group-topology---members-duties).
- * If a node is down, the cluster will reassign the work to another node for the duration.
+ * If a node is down, the cluster will reassign the work to another node.
-* Once enabled, an **ongoing task** runs in the background,
- and its responsible node executes the defined task work whenever relevant data changes occur.
+* Once enabled, an **ongoing task** runs in the background and executes its defined
+ work whenever relevant data changes occur.
-* In this page:
+* Ongoing tasks can also be managed via the Client API.
+ See [Ongoing tasks operations](../../../../client-api/operations/maintenance/ongoing-tasks/ongoing-task-operations.mdx).
+
+* In this article:
* [The ongoing tasks](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#the-ongoing-tasks)
- * [The ongoing tasks list - View](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#the-ongoing-tasks-list---view)
- * [The ongoing tasks list - Actions](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#the-ongoing-tasks-list---actions)
+ * [Creating a new task](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#creating-a-new-task)
+ * [Available task types](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#available-task-types)
+ * [The ongoing tasks list](../../../../studio/database/tasks/ongoing-tasks/general-info.mdx#the-ongoing-tasks-list)
-## The ongoing tasks
+
+
+
+
+### Creating a new task
+
+To create a new database task open the **Ongoing Tasks** view, click the **Add a Database Task** button,
+and select a task type.
+
+
+
+
+
+---
+
+
+
+### Available task types
+
+The following task types are available:
-The available ongoing tasks are:
+
-
+**AI:**
+
+* **[GenAI](../../../../ai-integration/gen-ai-integration/overview.mdx)**
+ Analyze and enrich your documents using an LLM.
+* **[Embeddings Generation](../../../../ai-integration/generating-embeddings/overview.mdx)**
+ Automatically generate embeddings from your document content.
**Replication:**
* **[External Replication](../../../../studio/database/tasks/ongoing-tasks/external-replication-task.mdx)**
Create a live replica of your database in another RavenDB database in another cluster.
This replication is initiated by the source database.
-* **[Hub/Sink Replication](../../../../studio/database/tasks/ongoing-tasks/hub-sink-replication/overview.mdx)**
- Create a live replica of your database, or a part of it, in another RavenDB database.
- The replication is initiated by the *Sink* task.
- The replication can be *bidirectional* or limited to a *single direction*.
- The replication can be *filtered* to allow the delivery of selected documents.
+* **[Replication Hub](../../../../studio/database/tasks/ongoing-tasks/hub-sink-replication/replication-hub-task.mdx)**
+ Replicate documents to and/or from one or more `Replication Sink` tasks in other RavenDB
+ databases across different clusters.
+* **[Replication Sink](../../../../studio/database/tasks/ongoing-tasks/hub-sink-replication/replication-sink-task.mdx)**
+ Connect to a central `Replication Hub` in another RavenDB cluster to receive documents,
+ and optionally replicate back.
+ The replication can be *bidirectional* or limited to a *single direction*,
+ and can be *filtered* to allow the delivery of selected documents.
-**Backups & Subscriptions:**
+**Backups:**
* **[Backup](../../../../backup/create/periodic-tasks/database-backup.mdx)**
Schedule a backup or a snapshot of the database at a specified point in time.
+
+**Subscriptions:**
+
* **[Subscription](../../../../client-api/data-subscriptions/what-are-data-subscriptions.mdx)**
- Send batches of documents that match a pre-defined query for processing on a client.
+ Send batches of documents that match a pre-defined query for processing on a client.
**ETL (RavenDB => Target):**
@@ -62,6 +98,9 @@ The available ongoing tasks are:
* **[SQL ETL](../../../../server/ongoing-tasks/etl/sql.mdx)**
Write the database data to a relational database.
Data can be filtered and modified with transformation scripts.
+* **[Snowflake ETL](../../../../studio/database/tasks/ongoing-tasks/snowflake-etl-task.mdx)**
+ Write all or chosen database documents to a Snowflake database.
+ Data can be filtered and modified with transformation scripts.
* **[OLAP ETL](../../../../studio/database/tasks/ongoing-tasks/olap-etl-task.mdx)**
Convert database data to the _Parquet_ file format for OLAP purposes.
Data can be filtered and modified with transformation scripts.
@@ -77,8 +116,11 @@ The available ongoing tasks are:
* **[Azure Queue Storage ETL](../../../../studio/database/tasks/ongoing-tasks/azure-queue-storage-etl.mdx)**
Write all or chosen database documents to Azure Queue Storage.
Data can be filtered and modified with transformation scripts.
+* **[Amazon SQS ETL](../../../../studio/database/tasks/ongoing-tasks/amazon-sqs-etl.mdx)**
+ Write all or chosen database documents to Amazon SQS queues.
+ Data can be filtered and modified with transformation scripts.
-**Sink (Source => RavendB)**
+**Sink (Source => RavenDB):**
* **[Kafka Sink](../../../../studio/database/tasks/ongoing-tasks/kafka-queue-sink.mdx)**
Consume and process incoming messages from Kafka topics.
@@ -87,32 +129,39 @@ The available ongoing tasks are:
Consume and process incoming messages from RabbitMQ queues.
Add scripts to Load, Put, or Delete documents in RavenDB based on the incoming messages.
+
+
-## The ongoing tasks list - View
-
-
-
-1. Navigate to **Tasks > Ongoing Tasks**
-
-2. The list of the current tasks defined for the database.
-
-3. The task name.
-
-4. The node that is currently responsible for executing the task.
-
+
+The tasks you create are listed in the Ongoing Tasks view, where you can see their status at a glance,
+expand task bars for further details, perform basic actions like disabling or deleting tasks, and open
+any task for editing.
-## The ongoing tasks list - Actions
+
-
+1. **Filter by name**
+ Enter a string to list only tasks whose name includes this string.
-1. **Add Task** - Create a new task for the database.
-2. **Enable / Disable** the task.
-3. **Details** - Click to see a short task details summary in this view.
-4. **Edit** - Click to edit the task.
-5. **Delete** the task.
+2. **Filter by type**
+ Click **All** to see tasks of all types.
+ Click a specific task type, e.g. `ETL`, to add tasks of this type to the view.
-The ongoing tasks can also be managed via the Client API. See [Ongoing tasks operations](../../../../client-api/operations/maintenance/ongoing-tasks/ongoing-task-operations.mdx).
+3. **Selection boxes**
+ Select all tasks using the "select all" checkbox at the top.
+ Select individual tasks using task-specific checkboxes.
+ Selecting tasks opens an action bar. Use **Set state** to enable or disable the selected tasks,
+ or **Delete** to remove them.
+ 
+4. **Task bar**
+ * Each defined task is represented by a task bar.
+ * A task bar always shows the task's name and type, whether it is enabled, and which cluster node is responsible for running it.
+ 
+ * You can enable, disable, edit, or delete the task.
+ You can also expand each task bar for additional details and options related to the task.
+ 
+ * Other details and available actions vary by task type.
+
diff --git a/sidebars.ts b/sidebars.ts
index 7bdaba7bf7..b378340461 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -97,6 +97,11 @@ const sidebars: SidebarsConfig = {
label: "AI Integration",
items: [{ type: "autogenerated", dirName: "ai-integration" }],
},
+ {
+ type: "category",
+ label: "Monitoring",
+ items: [{ type: "autogenerated", dirName: "monitoring" }],
+ },
{
type: "category",
label: "Glossary",