Summary
While running an agent against query_PATENTS, three places in the question text appear to be under-specified relative to what the validators check. This is a question for the maintainers — is the ambiguity intentional (e.g., to test agents' ability to handle under-specified data tasks), or worth tightening to match the convention already established in db_description_withhint.txt files for other datasets?
Observations
1. CPC hierarchy "level" naming is benchmark-internal
The questions in query_PATENTS reference CPC hierarchy "levels" by number, but the level numbering doesn't match any standard CPC reference (USPTO, EPO, Wikipedia) we could find.
Comparing query1 and query2 ground truths:
| Query |
Question text |
Ground-truth code shape |
Standard CPC term |
| q1 |
"…CPC group codes at level 5 whose best year is 2022." |
4-char codes (e.g. A22B, A23J, A41G) |
subclass |
| q2 |
"…the best year for each CPC group at level 4." |
3-char codes (e.g. A21, A61, B23) |
class |
So "level 5" in q1 means subclass and "level 4" in q2 means class. An agent that interprets these levels via standard CPC documentation lands at a different granularity than the validator expects.
2. Exponential moving average initialization is unspecified
Both q1 and q2 ask for the "highest exponential moving average of patent filings each year." The recurrence
EMA[t] = α · x[t] + (1 − α) · EMA[t−1]
requires a seed for EMA[0], and there are at least three legitimate conventions:
- Seed with first observation:
EMA[0] = x[0]
- Seed with zero:
EMA[0] = 0
- Simple-average warmup:
EMA[N−1] = mean(x[0..N−1])
Each produces different EMA series and consequently different "best year" selections per CPC code, which then changes which codes pass the q1 (best_year = 2022) filter. The question doesn't say which convention to use.
3. The cardinality of "highest" is unspecified
The question asks for the CPC areas with the "highest" exponential moving average but doesn't say how many. The q1 ground truth has exactly 50 entries and the q2 ground truth has 23. An agent reading the question alone has no way to derive these specific cutoffs.
Suggested resolution (optional)
If the ambiguity is unintentional, the most lightweight fix would mirror what other datasets in DAB already do — add a paragraph to query_PATENTS/db_description_withhint.txt pinning down the conventions, e.g.:
"EMA initialization: seed with the first observation (EMA[0] = x[0]). 'Level 5' refers to the 4-character CPC subclass code (e.g., A22B); 'level 4' refers to the 3-character class code (e.g., A22). For 'highest' rankings, return the top 50 (q1) or top 23 (q2) entries."
This stays in the spirit of the existing hint-file convention (e.g., the explicit term-code definitions in query_stockmarket/db_description_withhint.txt) and doesn't change the questions themselves. Happy to send a PR if the maintainers think this direction is welcome.
If the under-specification is intentional — testing agents on ambiguous real-world specs — feel free to close the issue; this is just to make sure the convention isn't an unintended gap.
Summary
While running an agent against
query_PATENTS, three places in the question text appear to be under-specified relative to what the validators check. This is a question for the maintainers — is the ambiguity intentional (e.g., to test agents' ability to handle under-specified data tasks), or worth tightening to match the convention already established indb_description_withhint.txtfiles for other datasets?Observations
1. CPC hierarchy "level" naming is benchmark-internal
The questions in
query_PATENTSreference CPC hierarchy "levels" by number, but the level numbering doesn't match any standard CPC reference (USPTO, EPO, Wikipedia) we could find.Comparing
query1andquery2ground truths:A22B,A23J,A41G)A21,A61,B23)So "level 5" in q1 means subclass and "level 4" in q2 means class. An agent that interprets these levels via standard CPC documentation lands at a different granularity than the validator expects.
2. Exponential moving average initialization is unspecified
Both q1 and q2 ask for the "highest exponential moving average of patent filings each year." The recurrence
requires a seed for
EMA[0], and there are at least three legitimate conventions:EMA[0] = x[0]EMA[0] = 0EMA[N−1] = mean(x[0..N−1])Each produces different
EMAseries and consequently different "best year" selections per CPC code, which then changes which codes pass the q1 (best_year = 2022) filter. The question doesn't say which convention to use.3. The cardinality of "highest" is unspecified
The question asks for the CPC areas with the "highest" exponential moving average but doesn't say how many. The q1 ground truth has exactly 50 entries and the q2 ground truth has 23. An agent reading the question alone has no way to derive these specific cutoffs.
Suggested resolution (optional)
If the ambiguity is unintentional, the most lightweight fix would mirror what other datasets in DAB already do — add a paragraph to
query_PATENTS/db_description_withhint.txtpinning down the conventions, e.g.:This stays in the spirit of the existing hint-file convention (e.g., the explicit term-code definitions in
query_stockmarket/db_description_withhint.txt) and doesn't change the questions themselves. Happy to send a PR if the maintainers think this direction is welcome.If the under-specification is intentional — testing agents on ambiguous real-world specs — feel free to close the issue; this is just to make sure the convention isn't an unintended gap.