How to reproduce this bug?
- Start a local single-node Weaviate instance reachable at
127.0.0.1:8080 and 127.0.0.1:50051.
- Install the Python client:
pip install weaviate-client==4.21.0
- Run the following standalone reproducer:
import sys
import time
import uuid
import weaviate
from weaviate.classes.config import Configure, DataType, Property
from weaviate.classes.data import DataObject
from weaviate.classes.query import Filter
from weaviate.collections.classes.aggregate import Metrics
HOST = "127.0.0.1"
PORT = 8080
GRPC_PORT = 50051
collection_name = f"MinGraphqlAggregateMismatch{int(time.time())}{uuid.uuid4().hex[:6]}"
client = weaviate.connect_to_local(host=HOST, port=PORT, grpc_port=GRPC_PORT)
try:
collection = client.collections.create(
name=collection_name,
properties=[
Property(name="bucket", data_type=DataType.INT, index_filterable=True, index_range_filters=True),
Property(name="intVal", data_type=DataType.INT, index_filterable=True, index_range_filters=True),
],
vector_config=Configure.Vectors.self_provided(),
inverted_index_config=Configure.inverted_index(index_null_state=True),
)
collection.data.insert_many(
[DataObject(properties={"bucket": 0, "intVal": 123}, vector=[1.0, 0.1, 0.2])]
)
client_resp = collection.aggregate.over_all(
filters=Filter.by_property("bucket").equal(99),
total_count=True,
return_metrics=[
Metrics("intVal").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
)
],
)
gql_query = (
"{ Aggregate { "
f'{collection_name}(where:{{path:["bucket"], operator:Equal, valueInt:99}}) '
"{ meta { count } intVal { count maximum mean median minimum mode sum type } } } }"
)
gql_resp = client.graphql_raw_query(gql_query)
gql_group = gql_resp.aggregate[collection_name][0]
client_metrics = client_resp.properties["intVal"]
gql_metrics = gql_group["intVal"]
print("client total_count:", client_resp.total_count)
print(
"client intVal:",
{
"count": client_metrics.count,
"maximum": client_metrics.maximum,
"mean": client_metrics.mean,
"median": client_metrics.median,
"minimum": client_metrics.minimum,
"mode": client_metrics.mode,
"sum_": client_metrics.sum_,
},
)
print("graphql meta.count:", gql_group["meta"]["count"])
print("graphql intVal:", gql_metrics)
finally:
try:
if client.collections.exists(collection_name):
client.collections.delete(collection_name)
finally:
client.close()
- Observe the output for the empty filter (bucket == 99).
What is the expected behavior?
collection.aggregate.over_all() should preserve the "no value" state for empty-set aggregate metrics.
For an empty match set, the aggregate result fields such as maximum, mean, median, minimum, mode, and sum_ should remain unset / None,
matching the semantics exposed by the raw GraphQL Aggregate response.
At minimum, the Python client should not silently collapse unset gRPC aggregate fields into scalar defaults like 0 or 0.0.
What is the actual behavior?
For an empty filter result:
- raw GraphQL Aggregate returns
null for numeric aggregate fields such as maximum, mean, median, minimum, mode, and sum
collection.aggregate.over_all() returns 0 / 0.0 for the same fields
Example observed output:
client total_count: 0
client intVal: {'count': 0, 'maximum': 0, 'mean': 0.0, 'median': 0.0, 'minimum': 0, 'mode': 0, 'sum_': 0}
graphql meta.count: 0
graphql intVal: {'count': 0, 'maximum': None, 'mean': None, 'median': None, 'minimum': None, 'mode': None, 'sum': None, 'type': 'int'}
I also inspected the raw gRPC aggregate reply and the numeric fields are actually unset there (HasField("maximum") == False, HasField("mean") ==
False, etc.). The client appears to read proto scalar default values directly instead of preserving field presence.
Supporting information
- Python: 3.11
weaviate-client: reproduced on 4.20.1 and 4.21.0
- Weaviate server:
1.36.10
- Local single-node instance on
127.0.0.1:8080 and 127.0.0.1:50051
Server Version
1.36.10
Weaviate Setup
Single Node
Nodes count
No response
Code of Conduct
How to reproduce this bug?
127.0.0.1:8080and127.0.0.1:50051.What is the expected behavior?
collection.aggregate.over_all()should preserve the "no value" state for empty-set aggregate metrics.For an empty match set, the aggregate result fields such as
maximum,mean,median,minimum,mode, andsum_should remain unset /None,matching the semantics exposed by the raw GraphQL Aggregate response.
At minimum, the Python client should not silently collapse unset gRPC aggregate fields into scalar defaults like
0or0.0.What is the actual behavior?
For an empty filter result:
nullfor numeric aggregate fields such asmaximum,mean,median,minimum,mode, andsumcollection.aggregate.over_all()returns0/0.0for the same fieldsExample observed output:
client total_count: 0
client intVal: {'count': 0, 'maximum': 0, 'mean': 0.0, 'median': 0.0, 'minimum': 0, 'mode': 0, 'sum_': 0}
graphql meta.count: 0
graphql intVal: {'count': 0, 'maximum': None, 'mean': None, 'median': None, 'minimum': None, 'mode': None, 'sum': None, 'type': 'int'}
I also inspected the raw gRPC aggregate reply and the numeric fields are actually unset there (HasField("maximum") == False, HasField("mean") ==
False, etc.). The client appears to read proto scalar default values directly instead of preserving field presence.
Supporting information
weaviate-client: reproduced on4.20.1and4.21.01.36.10127.0.0.1:8080and127.0.0.1:50051Server Version
1.36.10
Weaviate Setup
Single Node
Nodes count
No response
Code of Conduct