Skip to content

Python client aggregate.over_all() collapses unset empty-set aggregate metrics to 0/0.0 instead of None #2036

@litt1e-c

Description

@litt1e-c

How to reproduce this bug?

  1. Start a local single-node Weaviate instance reachable at 127.0.0.1:8080 and 127.0.0.1:50051.
  2. Install the Python client:
    pip install weaviate-client==4.21.0
    
  3. Run the following standalone reproducer:
     import sys
     import time
     import uuid
     import weaviate
     from weaviate.classes.config import Configure, DataType, Property
     from weaviate.classes.data import DataObject
     from weaviate.classes.query import Filter
     from weaviate.collections.classes.aggregate import Metrics
     HOST = "127.0.0.1"
     PORT = 8080
     GRPC_PORT = 50051
     collection_name = f"MinGraphqlAggregateMismatch{int(time.time())}{uuid.uuid4().hex[:6]}"
     client = weaviate.connect_to_local(host=HOST, port=PORT, grpc_port=GRPC_PORT)
     try:
         collection = client.collections.create(
             name=collection_name,
             properties=[
                 Property(name="bucket", data_type=DataType.INT, index_filterable=True, index_range_filters=True),
                 Property(name="intVal", data_type=DataType.INT, index_filterable=True, index_range_filters=True),
             ],
             vector_config=Configure.Vectors.self_provided(),
             inverted_index_config=Configure.inverted_index(index_null_state=True),
         )
         collection.data.insert_many(
             [DataObject(properties={"bucket": 0, "intVal": 123}, vector=[1.0, 0.1, 0.2])]
         )
         client_resp = collection.aggregate.over_all(
             filters=Filter.by_property("bucket").equal(99),
             total_count=True,
             return_metrics=[
                 Metrics("intVal").integer(
                     count=True,
                     maximum=True,
                     mean=True,
                     median=True,
                     minimum=True,
                     mode=True,
                     sum_=True,
                 )
             ],
         )
         gql_query = (
             "{ Aggregate { "
             f'{collection_name}(where:{{path:["bucket"], operator:Equal, valueInt:99}}) '
             "{ meta { count } intVal { count maximum mean median minimum mode sum type } } } }"
         )
         gql_resp = client.graphql_raw_query(gql_query)
         gql_group = gql_resp.aggregate[collection_name][0]
         client_metrics = client_resp.properties["intVal"]
         gql_metrics = gql_group["intVal"]
         print("client total_count:", client_resp.total_count)
         print(
             "client intVal:",
             {
                 "count": client_metrics.count,
                 "maximum": client_metrics.maximum,
                 "mean": client_metrics.mean,
                 "median": client_metrics.median,
                 "minimum": client_metrics.minimum,
                 "mode": client_metrics.mode,
                 "sum_": client_metrics.sum_,
             },
         )
         print("graphql meta.count:", gql_group["meta"]["count"])
         print("graphql intVal:", gql_metrics)
     finally:
         try:
             if client.collections.exists(collection_name):
                 client.collections.delete(collection_name)
         finally:
             client.close()
  1. Observe the output for the empty filter (bucket == 99).

What is the expected behavior?

collection.aggregate.over_all() should preserve the "no value" state for empty-set aggregate metrics.
For an empty match set, the aggregate result fields such as maximum, mean, median, minimum, mode, and sum_ should remain unset / None,
matching the semantics exposed by the raw GraphQL Aggregate response.
At minimum, the Python client should not silently collapse unset gRPC aggregate fields into scalar defaults like 0 or 0.0.

What is the actual behavior?

For an empty filter result:

  • raw GraphQL Aggregate returns null for numeric aggregate fields such as maximum, mean, median, minimum, mode, and sum
  • collection.aggregate.over_all() returns 0 / 0.0 for the same fields

Example observed output:
client total_count: 0
client intVal: {'count': 0, 'maximum': 0, 'mean': 0.0, 'median': 0.0, 'minimum': 0, 'mode': 0, 'sum_': 0}
graphql meta.count: 0
graphql intVal: {'count': 0, 'maximum': None, 'mean': None, 'median': None, 'minimum': None, 'mode': None, 'sum': None, 'type': 'int'}

I also inspected the raw gRPC aggregate reply and the numeric fields are actually unset there (HasField("maximum") == False, HasField("mean") ==
False, etc.). The client appears to read proto scalar default values directly instead of preserving field presence.

Supporting information

  • Python: 3.11
  • weaviate-client: reproduced on 4.20.1 and 4.21.0
  • Weaviate server: 1.36.10
  • Local single-node instance on 127.0.0.1:8080 and 127.0.0.1:50051

Server Version

1.36.10

Weaviate Setup

Single Node

Nodes count

No response

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions