Skip to content

GH-5626 N-ary aggregate function implementation#5628

Closed
nk2IsHere wants to merge 1 commit intoeclipse-rdf4j:mainfrom
nk2IsHere:GH-5626-custom-aggregate-extension
Closed

GH-5626 N-ary aggregate function implementation#5628
nk2IsHere wants to merge 1 commit intoeclipse-rdf4j:mainfrom
nk2IsHere:GH-5626-custom-aggregate-extension

Conversation

@nk2IsHere
Copy link
Contributor

GitHub issue resolved: #5626

Briefly describe the changes proposed in this PR:

This PR adds support for custom aggregate functions with multiple arguments. It adds new factory and type for it (CustomAggregateNAryFunctionRegistry, AggregateNArgFunction) and extends SPARQL parser to support such functions. As a consequence, minor changes to AggregateFunctionCall were done too with addition of AbstractNAryAggregateOperator. I've tried to introduce no breaking changes and only extend the support as an opt-in feature.


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@hmottestad hmottestad changed the base branch from main to develop December 18, 2025 18:53
@hmottestad
Copy link
Contributor

I moved this to target develop. I believe the conflicts are my fault.

@nk2IsHere nk2IsHere force-pushed the GH-5626-custom-aggregate-extension branch from 7422e3f to 3fb90d8 Compare December 19, 2025 12:38
@hmottestad hmottestad changed the base branch from develop to main December 23, 2025 21:54
@nk2IsHere nk2IsHere force-pushed the GH-5626-custom-aggregate-extension branch from 3fb90d8 to a998c55 Compare January 7, 2026 10:49
@hmottestad
Copy link
Contributor

@nk2IsHere Can you take a look at the following review by codex and see if you think that any of this is valid and if anything needs to be fixed?


N-ary registry selection is arity-based, so 1-arg n-ary functions can’t run

Why it matters: A function registered only in the n-ary registry but invoked with one argument will parse (parser checks n-ary registry), then fail at evaluation with “Unknown aggregate function” because GroupIterator only consults the n-ary registry when args.size() > 1. That’s a runtime correctness bug for valid-looking queries.
Exact location: core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/iterator/GroupIterator.java:460-490 and core/queryparser/sparql/src/main/java/org/eclipse/rdf4j/query/parser/sparql/TupleExprBuilder.java:1994-2005
Suggested fix: Choose registry based on registration (or explicit arity metadata), not on argument count. For example: check n-ary registry first regardless of arg count, or add an supportsArity(int) API and validate in the parser, then use the same decision in evaluation.

Missing arity validation allows zero-argument aggregates and can throw at evaluation time

Why it matters: TupleExprBuilder now accepts any number of args (including zero). If a custom aggregate is called with zero args, GroupIterator will build a unary evaluator with index 0, which throws IndexOutOfBoundsException in precompileNAryArg(...).asUnaryEvaluator(0). That’s a hard failure path that likely wasn’t intended.
Exact location: core/queryparser/sparql/src/main/java/org/eclipse/rdf4j/query/parser/sparql/TupleExprBuilder.java:1994-2005 and core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/iterator/GroupIterator.java:485-515
Suggested fix: Enforce minimum arity (>=1) for aggregate calls in the parser, or guard in GroupIterator with a clear QueryEvaluationException if getArguments().isEmpty().

DISTINCT semantics for n-ary aggregates are underspecified and likely incorrect

Why it matters: DISTINCT is implemented via Predicate<Value> and passed into AggregateNAryFunction. For n-ary aggregates, DISTINCT should typically apply to the tuple of argument values per binding set. With the current API, a function cannot implement tuple-level distinctness, leading to potential semantic mismatches.
Exact location: core/queryparser/sparql/src/main/java/org/eclipse/rdf4j/query/parser/sparql/aggregate/AggregateNAryFunction.java:27-40 and core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/iterator/GroupIterator.java:421-474
Suggested fix: If tuple-level DISTINCT is intended, change the API to provide a predicate over List<Value> or BindingSet (or provide a distinct evaluator for the whole argument vector). If DISTINCT is intentionally per-value, document that explicitly.

@nk2IsHere nk2IsHere force-pushed the GH-5626-custom-aggregate-extension branch from a998c55 to 748fa28 Compare January 26, 2026 20:58
@nk2IsHere
Copy link
Contributor Author

Points 2 and 3 are now covered in the code. Point 1 is not covered as it is intended for NAry functions to not cover single argument cases

@nk2IsHere nk2IsHere changed the base branch from main to develop January 27, 2026 19:35
@nk2IsHere nk2IsHere changed the base branch from develop to main January 27, 2026 19:35
@Ostrzyciel
Copy link
Contributor

@hmottestad can we get this merged? :)

@hmottestad
Copy link
Contributor

@Ostrzyciel @nk2IsHere

Can you take a look at #5707 and see if the fixes I've introduced are correct?

@hmottestad hmottestad closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for positioned additional parameters in custom aggregation functions

3 participants