This guide provides ready-to-use PromQL queries for S3 Storage Plugin metrics, including panel configuration details (legend, min step, units).
The S3 plugin exposes two primary metrics to track file operations and errors:
rr_s3_operations_total- Counter tracking all S3 operations by type, bucket, and statusrr_s3_errors_total- Counter tracking errors by bucket and error type
Query:
sum(rate(rr_s3_operations_total[5m]))
Configuration:
- Legend:
Total OPS - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Overall S3 operation rate across all buckets
Query:
sum by (bucket) (rate(rr_s3_operations_total[5m]))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph or Bar gauge
- Description: Operation rate grouped by bucket
Query:
sum by (operation) (rate(rr_s3_operations_total[5m]))
Configuration:
- Legend:
{{operation}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph (stacked area) or Pie chart
- Description: Operation distribution by type (write, read, delete, copy, move, list, exists, get_metadata, set_visibility, get_url)
Query:
sum by (status) (rate(rr_s3_operations_total[5m]))
Configuration:
- Legend:
{{status}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph (stacked area)
- Description: Operation rate grouped by status (success, error)
Query:
sum(rate(rr_s3_operations_total{status="success"}[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100
Configuration:
- Legend:
Success Rate - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Gauge or Graph
- Thresholds: Red < 95%, Yellow 95-99%, Green > 99%
- Description: Percentage of successful S3 operations
Query:
sum(rr_s3_operations_total)
Configuration:
- Legend:
Total Operations - Min Step:
1m - Unit:
short - Panel Type: Stat
- Description: Cumulative count of all S3 operations since start
Query:
topk(10, sum by (bucket) (rate(rr_s3_operations_total[5m])))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Bar gauge (horizontal) or Table
- Description: Top 10 buckets by operation rate
Query:
topk(10, sum by (bucket) (rate(rr_s3_operations_total{operation="write"}[5m])))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Bar gauge or Table
- Description: Top 10 buckets by write operation rate
Query:
topk(10, sum by (bucket) (rate(rr_s3_operations_total{operation="read"}[5m])))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Bar gauge or Table
- Description: Top 10 buckets by read operation rate
Query 1 (Total OPS):
sum by (bucket) (rate(rr_s3_operations_total[5m]))
Query 2 (Write OPS):
sum by (bucket) (rate(rr_s3_operations_total{operation="write"}[5m]))
Query 3 (Read OPS):
sum by (bucket) (rate(rr_s3_operations_total{operation="read"}[5m]))
Query 4 (Error %):
sum by (bucket) (rate(rr_s3_operations_total{status="error"}[5m])) / sum by (bucket) (rate(rr_s3_operations_total[5m])) * 100
Configuration:
- Legend: N/A (Table columns)
- Min Step:
15s - Unit:
- Query 1-3:
ops (operations/sec) - Query 4:
percent (0-100)
- Query 1-3:
- Panel Type: Table
- Column Names:
Bucket,Total OPS,Write OPS,Read OPS,Error Rate % - Description: Comprehensive bucket performance overview
Query:
sum by (bucket) (rate(rr_s3_operations_total{operation="read"}[5m])) / sum by (bucket) (rate(rr_s3_operations_total{operation="write"}[5m]))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
short(ratio) - Panel Type: Graph or Table
- Description: Read-to-write ratio per bucket (higher = more reads than writes)
Query:
sum(rate(rr_s3_operations_total{operation="write"}[5m]))
Configuration:
- Legend:
Writes/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file upload rate
Query:
sum(rate(rr_s3_operations_total{operation="read"}[5m]))
Configuration:
- Legend:
Reads/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file download rate
Query:
sum(rate(rr_s3_operations_total{operation="delete"}[5m]))
Configuration:
- Legend:
Deletes/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file deletion rate
Query:
sum(rate(rr_s3_operations_total{operation="list"}[5m]))
Configuration:
- Legend:
Lists/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total object listing rate
Query:
sum(rate(rr_s3_operations_total{operation="copy"}[5m]))
Configuration:
- Legend:
Copies/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file copy rate
Query:
sum(rate(rr_s3_operations_total{operation="move"}[5m]))
Configuration:
- Legend:
Moves/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file move rate
Query:
sum(rate(rr_s3_operations_total{operation="exists"}[5m]))
Configuration:
- Legend:
Exists checks/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total file existence check rate
Query:
sum(rate(rr_s3_operations_total{operation="get_metadata"}[5m]))
Configuration:
- Legend:
Metadata ops/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total metadata retrieval rate
Query:
sum(rate(rr_s3_operations_total{operation="set_visibility"}[5m]))
Configuration:
- Legend:
Visibility changes/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total ACL change rate
Query:
sum(rate(rr_s3_operations_total{operation="get_url"}[5m]))
Configuration:
- Legend:
URL gens/sec - Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph
- Description: Total URL generation rate (public/presigned)
Query:
sum by (operation) (rate(rr_s3_operations_total[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100
Configuration:
- Legend:
{{operation}} - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Pie chart
- Description: Percentage breakdown of operations by type
Query 1 (Writes):
sum(rate(rr_s3_operations_total{operation="write"}[5m]))
Query 2 (Reads):
sum(rate(rr_s3_operations_total{operation="read"}[5m]))
Configuration:
- Legend:
- Query 1:
Writes - Query 2:
Reads
- Query 1:
- Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph (stacked area)
- Description: Visual comparison of write vs read operations
Query:
sum(rate(rr_s3_errors_total[5m]))
Configuration:
- Legend:
Errors/sec - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Total S3 errors per second (all types)
Query:
sum(rate(rr_s3_operations_total{status="error"}[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100
Configuration:
- Legend:
Error Rate - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Gauge or Graph
- Thresholds: Green < 1%, Yellow 1-5%, Red > 5%
- Description: Percentage of operations that result in errors
Query:
sum by (error_type) (rate(rr_s3_errors_total[5m]))
Configuration:
- Legend:
{{error_type}} - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph (stacked) or Pie chart
- Description: Errors grouped by classification (BUCKET_NOT_FOUND, FILE_NOT_FOUND, S3_OPERATION_FAILED, etc.)
Query:
sum by (bucket) (rate(rr_s3_errors_total[5m]))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph or Table
- Description: Errors grouped by bucket
Query:
topk(10, sum by (bucket) (rate(rr_s3_errors_total[5m])))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
errors/sec - Panel Type: Bar gauge or Table
- Description: Buckets with highest error rate
Query:
topk(10, sum by (bucket) (rate(rr_s3_operations_total{status="error"}[5m])) / sum by (bucket) (rate(rr_s3_operations_total[5m])) * 100)
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Bar gauge or Table
- Description: Buckets with highest error percentage
Query:
sum(rate(rr_s3_errors_total{error_type="BUCKET_NOT_FOUND"}[5m]))
Configuration:
- Legend:
Bucket Not Found - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Rate of bucket not found errors
Query:
sum(rate(rr_s3_errors_total{error_type="FILE_NOT_FOUND"}[5m]))
Configuration:
- Legend:
File Not Found - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Rate of file not found errors
Query:
sum(rate(rr_s3_errors_total{error_type="S3_OPERATION_FAILED"}[5m]))
Configuration:
- Legend:
S3 Operation Failed - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Rate of S3 SDK operation failures
Query:
sum(rate(rr_s3_errors_total{error_type="PERMISSION_DENIED"}[5m]))
Configuration:
- Legend:
Permission Denied - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Rate of permission/access denied errors
Query:
sum(rate(rr_s3_errors_total{error_type="INVALID_PATHNAME"}[5m]))
Configuration:
- Legend:
Invalid Pathname - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Description: Rate of invalid pathname errors
Query:
sum(rate(rr_s3_errors_total{error_type="OPERATION_TIMEOUT"}[5m]))
Configuration:
- Legend:
Timeouts - Min Step:
15s - Unit:
errors/sec - Panel Type: Graph
- Thresholds: Any value > 0 requires investigation
- Description: Rate of operation timeout errors
Query:
sum by (error_type) (rate(rr_s3_errors_total[5m])) / sum(rate(rr_s3_errors_total[5m])) * 100
Configuration:
- Legend:
{{error_type}} - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Pie chart
- Description: Percentage breakdown of errors by type
Query:
sum by (bucket, error_type) (rate(rr_s3_errors_total[5m]))
Configuration:
- Legend: N/A (heatmap)
- Min Step:
30s - Unit:
errors/sec - Panel Type: Heatmap
- Description: Visual correlation between buckets and error types
Query 1 (Operations - Left Axis):
sum(rate(rr_s3_operations_total[5m]))
Query 2 (Errors - Right Axis):
sum(rate(rr_s3_errors_total[5m]))
Configuration:
- Legend:
- Query 1:
Operations/sec - Query 2:
Errors/sec
- Query 1:
- Min Step:
15s - Unit:
- Left Axis:
ops (operations/sec) - Right Axis:
errors/sec
- Left Axis:
- Panel Type: Graph (dual Y-axis)
- Description: Correlation between operation rate and error rate
Query 1 (Success):
sum(rate(rr_s3_operations_total{status="success"}[5m]))
Query 2 (Error):
sum(rate(rr_s3_operations_total{status="error"}[5m]))
Configuration:
- Legend:
- Query 1:
Success - Query 2:
Error
- Query 1:
- Min Step:
15s - Unit:
ops (operations/sec) - Panel Type: Graph (stacked area)
- Description: Visual comparison of successful vs failed operations
Query:
sum(rate(rr_s3_operations_total{operation="write",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="write"}[5m])) * 100
Configuration:
- Legend:
Write Success Rate - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Graph or Gauge
- Thresholds: Red < 95%, Yellow 95-99%, Green > 99%
- Description: Success rate for write operations only
Query:
sum(rate(rr_s3_operations_total{operation="read",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="read"}[5m])) * 100
Configuration:
- Legend:
Read Success Rate - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Graph or Gauge
- Thresholds: Red < 95%, Yellow 95-99%, Green > 99%
- Description: Success rate for read operations only
Query 1 (Write):
sum(rate(rr_s3_operations_total{operation="write",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="write"}[5m])) * 100
Query 2 (Read):
sum(rate(rr_s3_operations_total{operation="read",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="read"}[5m])) * 100
Query 3 (Delete):
sum(rate(rr_s3_operations_total{operation="delete",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="delete"}[5m])) * 100
Query 4 (List):
sum(rate(rr_s3_operations_total{operation="list",status="success"}[5m])) / sum(rate(rr_s3_operations_total{operation="list"}[5m])) * 100
Configuration:
- Legend: N/A (Table rows)
- Min Step:
15s - Unit:
percent (0-100) - Panel Type: Table
- Row Names:
Write,Read,Delete,List - Description: Success rate breakdown by operation type
Query:
sum(rate(rr_s3_operations_total[1h])) / sum(rate(rr_s3_operations_total[1h] offset 24h))
Configuration:
- Legend:
HoH Change - Min Step:
5m - Unit:
short(ratio) - Panel Type: Graph or Stat
- Description: Current hour traffic vs same hour yesterday (1.0 = same, 2.0 = double)
Query:
sum(rate(rr_s3_errors_total[1m])) > 2 * avg_over_time(sum(rate(rr_s3_errors_total[1m]))[10m:1m])
Configuration:
- Legend:
Error Burst - Min Step:
15s - Unit:
bool(0 or 1) - Panel Type: Graph (binary)
- Thresholds: Red when value = 1
- Description: Detects sudden spikes in errors (>2x baseline)
Query:
sum by (bucket) (rr_s3_operations_total)
Configuration:
- Legend:
{{bucket}} - Min Step:
1m - Unit:
short - Panel Type: Pie chart or Bar gauge
- Description: Total cumulative operations per bucket
Query:
sum by (bucket) (rate(rr_s3_operations_total[5m]))
Configuration:
- Legend: N/A (heatmap)
- Min Step:
30s - Unit:
ops (operations/sec) - Panel Type: Heatmap
- Description: Visual activity pattern across buckets over time
Query:
(sum by (bucket) (rate(rr_s3_operations_total{operation="write"}[5m])) > sum by (bucket) (rate(rr_s3_operations_total{operation="read"}[5m])))
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
bool(0 or 1) - Panel Type: Graph or Table
- Description: Identifies write-heavy buckets (1 = more writes than reads)
Query:
bottomk(1, sum by (bucket) (rate(rr_s3_operations_total{status="error"}[5m])) / sum by (bucket) (rate(rr_s3_operations_total[5m])) * 100)
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Stat
- Description: Bucket with lowest error rate
Query:
topk(1, sum by (bucket) (rate(rr_s3_operations_total{status="error"}[5m])) / sum by (bucket) (rate(rr_s3_operations_total[5m])) * 100)
Configuration:
- Legend:
{{bucket}} - Min Step:
15s - Unit:
percent (0-100) - Panel Type: Stat
- Thresholds: Red > 5%, Yellow 1-5%, Green < 1%
- Description: Bucket with highest error rate
- Total Operations/sec - Stat panel
- Success Rate % - Gauge with thresholds
- Total Errors/sec - Stat panel with threshold colors
- Active Buckets - Stat (count of buckets with ops > 0)
- Operations by Type - Stacked area graph
- Operations by Bucket - Graph (time series)
- Success vs Error Rate - Stacked area graph
- Operation Success Rate by Type - Table
- Errors by Type - Pie chart or Stacked area
- Most Error-Prone Buckets - Bar gauge
- Bucket Performance Table - Table with multiple queries (Total OPS, Write OPS, Read OPS, Error %)
- Error Heatmap (Bucket vs Type) - Heatmap
- Read/Write Ratio by Bucket - Bar gauge
Rate Units:
ops (operations/sec)- for operation rateserrors/sec- for error rates
Percentage:
percent (0-100)- displays as 95%percentunit (0.0-1.0)- displays 0.95 as 95%
Count:
short- auto-formats large numbers (1K, 1M)none- raw number
Boolean:
bool- 0 or 1bool_yes_no- displays as Yes/No
Green: < 1%
Yellow: 1-5%
Red: > 5%
Red: < 95%
Yellow: 95-99%
Green: > 99%
Red: value = 1 (burst detected)
Green: value = 0 (normal)
High Error Rate:
- alert: S3HighErrorRate
expr: sum(rate(rr_s3_operations_total{status="error"}[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100 > 5
for: 5m
labels:
severity: critical
annotations:
summary: "S3 plugin error rate above 5%"
description: "Error rate is {{ $value }}% (threshold: 5%)"Bucket Not Found:
- alert: S3BucketNotFoundErrors
expr: sum(rate(rr_s3_errors_total{error_type="BUCKET_NOT_FOUND"}[5m])) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "S3 bucket not found errors detected"
description: "Configuration issue: bucket references don't exist"Permission Denied:
- alert: S3PermissionDenied
expr: sum(rate(rr_s3_errors_total{error_type="PERMISSION_DENIED"}[5m])) > 0
for: 2m
labels:
severity: critical
annotations:
summary: "S3 permission denied errors"
description: "Credentials or IAM policy issue detected"Elevated Error Rate:
- alert: S3ElevatedErrorRate
expr: sum(rate(rr_s3_operations_total{status="error"}[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100 > 1
for: 10m
labels:
severity: warning
annotations:
summary: "S3 plugin error rate elevated"
description: "Error rate is {{ $value }}% (threshold: 1%)"Timeout Errors:
- alert: S3OperationTimeouts
expr: sum(rate(rr_s3_errors_total{error_type="OPERATION_TIMEOUT"}[5m])) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "S3 operation timeouts detected"
description: "Network or S3 service performance issue"See grafana_dashboard_example.json for a complete pre-configured dashboard including all key metrics and recommended visualizations.
-
Verify metrics plugin is enabled in
.rr.yaml:metrics: address: 127.0.0.1:2112
-
Check metrics endpoint:
curl http://localhost:2112/metrics | grep rr_s3 -
Ensure S3 plugin is performing operations (metrics only appear after first operation)
-
Check error types with:
sum by (error_type) (rate(rr_s3_errors_total[5m])) -
Identify problematic buckets:
topk(5, sum by (bucket) (rate(rr_s3_errors_total[5m]))) -
Review RoadRunner logs for detailed error messages
- Check AWS credentials configuration
- Verify IAM policy has required permissions
- Review
PERMISSION_DENIEDerror rate by bucket
- Monitor operation rate trends (Hour over Hour)
- Check for error bursts
- Review specific operation types (write/read/list) for bottlenecks
HTTP Traffic vs S3 Operations:
# HTTP RPS
sum(rate(rr_http_requests_total[5m]))
# S3 OPS
sum(rate(rr_s3_operations_total[5m]))
Worker Pool vs S3 Activity:
# Worker utilization
rr_http_worker_utilization_percent
# S3 write operations
sum(rate(rr_s3_operations_total{operation="write"}[5m]))
This correlation helps identify if S3 operations are causing worker pool pressure.
- Use
rate()overirate()for smoother graphs - Set appropriate
[5m]intervals based on traffic volume - Use
topk()to limit cardinality in busy systems
- Set up critical alerts for BUCKET_NOT_FOUND (config issues)
- Monitor PERMISSION_DENIED (security issues)
- Alert on error rate >5% sustained for 5 minutes
- Group metrics by concern (operations, errors, buckets)
- Use consistent color schemes across panels
- Include both rate and percentage views
- Default Prometheus retention: 15 days
- Increase for long-term S3 usage analysis
- Consider recording rules for long-term aggregations
For high-traffic systems, pre-compute common queries:
groups:
- name: s3_recording_rules
interval: 30s
rules:
- record: s3:operations:rate5m
expr: sum(rate(rr_s3_operations_total[5m]))
- record: s3:operations:rate5m:by_bucket
expr: sum by (bucket) (rate(rr_s3_operations_total[5m]))
- record: s3:errors:rate5m
expr: sum(rate(rr_s3_errors_total[5m]))
- record: s3:success_rate:percent
expr: sum(rate(rr_s3_operations_total{status="success"}[5m])) / sum(rate(rr_s3_operations_total[5m])) * 100Then query using: s3:operations:rate5m instead of full expression.
For additional support or questions about S3 plugin metrics, refer to the main RoadRunner documentation or open an issue on GitHub.