Skip to content

Add report full api#298

Open
xiaoxichen wants to merge 2 commits intoeBay:masterfrom
xiaoxichen:add-report-full-api
Open

Add report full api#298
xiaoxichen wants to merge 2 commits intoeBay:masterfrom
xiaoxichen:add-report-full-api

Conversation

@xiaoxichen
Copy link
Copy Markdown
Contributor

No description provided.

This change adds MetricsFarm::report_full() to force histograms to
publish complete bucket distributions on-demand, regardless of their
registration mode (publish_as_sum_count, publish_as_gauge, or
publish_as_histogram).

Implementation:
- Add publish_full() method to MetricsGroupImpl for internal publishing
- Add report_full() to MetricsFarm that creates temporary reporter
- Add accessor methods (labels()) to StaticInfo classes for encapsulation

Use case: Allows operators to scrape complete histogram distributions
for detailed analysis, while normal scraping uses sum/count mode to
reduce Prometheus cardinality.

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
@xiaoxichen xiaoxichen force-pushed the add-report-full-api branch from 724e4ce to 77d9d60 Compare March 11, 2026 04:49
@xiaoxichen xiaoxichen requested a review from Besroy March 11, 2026 04:49
Histogram sum and count should use Gauge instead of Counter because:
- Gauges can increase or decrease, representing current state
- Counters are monotonically increasing and should never decrease
- Histogram sum/count represent current aggregated values, not accumulations
- Semantically correct for Prometheus metrics

This change updates PrometheusReportSumCount to use prometheus::Gauge
instead of prometheus::Counter, and changes set_value() to use Set()
instead of Increment().

Signed-off-by: Xiaoxi Chen <xiaoxchen@ebay.com>
@xiaoxichen xiaoxichen force-pushed the add-report-full-api branch from 77d9d60 to 4c37771 Compare March 11, 2026 08:17
@xiaoxichen xiaoxichen requested a review from szmyd March 11, 2026 08:21
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 32.72727% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.38%. Comparing base (370c772) to head (4c37771).
⚠️ Report is 88 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #298       +/-   ##
===========================================
- Coverage   64.29%   50.38%   -13.92%     
===========================================
  Files          72       63        -9     
  Lines        4406     4178      -228     
  Branches      555     1825     +1270     
===========================================
- Hits         2833     2105      -728     
+ Misses       1327      879      -448     
- Partials      246     1194      +948     
Components Coverage Δ
AuthManager 55.55% <ø> (-22.23%) ⬇️
Cache 31.52% <54.12%> (+1.58%) ⬆️
FDS 60.57% <21.73%> (-10.55%) ⬇️
FileWatcher 37.60% <20.00%> (-18.65%) ⬇️
Flip ∅ <ø> (∅)
gRPC 56.59% <46.87%> (-20.46%) ⬇️
Logging 27.65% <22.72%> (-2.53%) ⬇️
Metrics 57.27% <48.92%> (-22.93%) ⬇️
Options 25.00% <ø> (-75.00%) ⬇️
Setting 29.62% <ø> (-27.17%) ⬇️
StatusObject 37.16% <ø> (-36.67%) ⬇️
Utility 61.24% <100.00%> (-21.48%) ⬇️
Version 36.00% <ø> (-59.84%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Besroy
Copy link
Copy Markdown
Contributor

Besroy commented Mar 11, 2026

LGTM. Could you help add more comment for report_full usage?

@szmyd
Copy link
Copy Markdown
Collaborator

szmyd commented Apr 2, 2026

PR Review (Claude Generated): Add report_full() API

Based on my review of the changes in the add-report-full-api branch, here's a comprehensive code review:

Overview

This PR adds a report_full() API that exposes complete histogram bucket data alongside the existing report() API which publishes histograms as sum/count only for Prometheus cardinality reduction.

Strengths

  1. Clean separation of concerns: The new report_full() API doesn't modify existing behavior - it creates a fresh reporter instance, avoiding side effects on the regular report() flow.

  2. Smart implementation: Using a temporary reporter with its own registry is elegant - it allows re-registering all metrics in full histogram mode without affecting the main reporter.

  3. Important bug fix: Changed histogram sum/count from Counter to Gauge in prometheus_reporter.hpp. This is correct - sum/count should be gauges since they can go down (e.g., after a rollover or reset), not monotonically increasing counters.

  4. Good test coverage: Added three comprehensive tests:

    • TestHistogramSumCount - validates sum/count mode
    • TestReportFull - validates full histogram mode
    • TestReportFullMultipleCalls - ensures multiple calls work correctly with updated values
  5. API consistency: Added labels() accessor methods to all metric types, which is good for the full publishing implementation.

Issues & Suggestions 🔍

Minor Issues:

  1. Version bump (conanfile.py:9): Bumped to 13.2.4 - ensure this follows your versioning scheme correctly. This looks like a minor/patch release given it's a new API addition.

  2. Lock scope in publish_full() (metrics_group_impl.cpp:277): The lock is acquired at the start of the function and held through all the registration and gathering. For large metric groups, this could be a lengthy hold. Consider whether the lock scope can be reduced.

  3. Memory consideration: report_full() creates vectors for storing temporary report objects. For systems with many metrics, this could use significant memory. Might want to document this or consider if there's a streaming approach.

  4. Documentation: The new report_full() API lacks inline documentation. Consider adding a comment explaining:

    • When to use report() vs report_full()
    • The tradeoff (cardinality vs detail)
    • Performance implications

Questions:

  1. Thread safety: In report_full(), each metrics group calls publish_full() sequentially. Is there any concern about the state of metrics changing between groups? The lock in each group protects individual group publishing, but the farm-level lock is held, so this should be fine.

  2. Error handling: What happens if publish_full() is called on a very large metrics farm? No guards against potential OOM scenarios.

Code Quality

  • Clean, readable code
  • Follows existing patterns well
  • No obvious bugs or memory leaks
  • Test names are descriptive

Overall Assessment

LGTM with minor suggestions. This is a solid implementation that adds useful functionality without breaking existing behavior. The main report() API retains the cardinality-reducing sum/count mode, while report_full() provides detailed bucket data when needed. The Counter→Gauge fix is an important correction.

Recommendation: Approve with consideration of adding API documentation comments.

@xiaoxichen
Copy link
Copy Markdown
Contributor Author

@szmyd can you consume your AI agent review into your human review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants