Bug fix: Prevent exemplar drop when filtered attributes exceed 128-rune limit#7883
Conversation
|
Thanks for adding the test. You will need to add the fix as well for us to merge the PR. |
|
okey, I started by adding the test to clearly capture the failure case and expected behavior. I’ll add the fix next so we can get this merged. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7883 +/- ##
=======================================
- Coverage 81.7% 81.7% -0.1%
=======================================
Files 304 304
Lines 23283 23296 +13
=======================================
+ Hits 19034 19043 +9
- Misses 3862 3865 +3
- Partials 387 388 +1
🚀 New features to boost your workflow:
|
92005e2 to
cfec6ba
Compare
|
My only concern with this is that we (Prometheus) are planning to relax the exemplar limit in OM 2.0: prometheus/docs#2835. I'm not sure if we would be able to remove this code without causing problems for people still using OM 1.0. @bwplotka do you think we should contribute this fix to the prometheus client instead so that it can have different exemplar behavior for OM 1.0 vs 2.0? |
ab3eec2 to
b2e1eef
Compare
|
Do you want me to make any further updates, or is the direction to close it? |
|
Reasons to have this logic in OTel-go:
Reasons to have this logic in Prometheus client-golang:
If client_golang is willing to treat trace_id or span_id as "special" keys, it would make sense to have truncation logic there. Otherwise, it is probably OK to have it here, and eventually make it configurable. It would stink a bit that it will probably have to be limited by default, even after OM 2.0. I'm opening an issue... |
|
Just wait a bit if that is OK. |
I’ve added a test case for #6718 to show what happens when filtered attributes are too long.
Currently, the test fails because the Prometheus client_golang has a 128-rune limit for exemplars. When we move filtered attributes into the exemplar and they exceed this limit, the whole thing gets dropped, including the trace_id and span_id. It even causes a parsing error in the output:
To fix this, I can add a check that ensures we always fit the trace_id and span_id first. Then, we only add the user's filtered attributes if there is enough space left in the 128-rune budget. If an attribute doesn't fit, we just skip it so we don't lose the Trace ID.