refactor(pinecone): make get_metadata_field_min_max more consistent; use async DocumentStore mixin tests#3231
Conversation
Replaces 17 duplicate async test methods in TestDocumentStoreAsync with generic mixin classes from haystack.testing.document_store_async (available since haystack-ai 2.28.0). Mixins added: - WriteDocumentsAsyncTest - CountDocumentsAsyncTest - FilterDocumentsAsyncTest - CountDocumentsByFilterAsyncTest - CountUniqueMetadataByFilterAsyncTest - DeleteDocumentsAsyncTest - DeleteAllAsyncTest - DeleteByFilterAsyncTest - UpdateByFilterAsyncTest - GetMetadataFieldsInfoAsyncTest - GetMetadataFieldMinMaxAsyncTest - GetMetadataFieldUniqueValuesAsyncTest Preserved (Pinecone-specific): - test_embedding_retrieval - test_close - test_sentence_window_retriever Version bump: haystack-ai>=2.26.1 -> >=2.28.0 (required for the haystack.testing.document_store_async module).
Coverage report (pinecone)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
anakin87
left a comment
There was a problem hiding this comment.
I ran the integration tests locally (they do not run on PRs from forks for security reasons)
I got the following errors
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_get_metadata_field_min_max_empty_collection_async - ValueError: No values found for metadata field 'priority'
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_delete_documents_empty_document_store_async - pinecone.exceptions.exceptions.NotFoundException: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Date': 'Mon, 27 Apr 2026 12:56:18 GMT', 'Content-Type': 'application/json', 'Content-Length': '55', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '25', 'x-envoy-upstream-service-time': '25', 'x-pinecone-response-duration-ms': '26', 'Server': 'envoy')>
HTTP response body: {"code":5,"message":"Namespace not found","details":[]}
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_simple_async - AssertionError
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_compound_async - AssertionError
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_async - NotImplementedError: Default write_documents_async() behaviour depends on the Document Store implementation, as we don't enforce a default behaviour when no policy is set. Override this test in your custom test class.
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_duplicate_fail_async - Failed: DID NOT RAISE <class 'haystack.document_stores.errors.errors.DuplicateDocumentError'>
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_duplicate_skip_async - AssertionError
I suggest you also run them locally (a free Pinecone API key is needed) and update this PR to make them pass
…ic behaviour
Seven integration tests from the mixin failed when anakin87 ran them locally:
- test_write_documents_async: NotImplementedError — the mixin intentionally
raises this to force each store to define its default write behaviour.
Overridden to match the sync test (upsert one doc, assert count == 1).
- test_write_documents_duplicate_fail_async / _skip_async: Pinecone only
supports UPSERT, so DuplicatePolicy.FAIL/SKIP have no effect. Skipped for
the same reason as the sync equivalents.
- test_delete_documents_empty_document_store_async: Pinecone raises
NotFoundException when deleting from a namespace that has never received a
write. Skipped to match the sync skip.
- test_get_metadata_field_min_max_empty_collection_async: the mixin expects
{min: None, max: None} but Pinecone raises ValueError("No values found…").
Overridden to assert the ValueError, consistent with the sync override.
- test_filter_simple_async / test_filter_compound_async: filter_documents_async
uses a dummy-vector similarity search internally, so documents are returned
in similarity order rather than insertion order. assert_documents_are_equal
is overridden to sort both lists by document id before comparing.
|
Heads-up for maintainers This PR is from a fork and touches integrations whose integration tests require API keys. Affected integrations:
Please run the integration tests locally ( |
There was a problem hiding this comment.
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_simple_async - AssertionError: assert [Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768), Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_2', 'number': 2, 'page': '100'}, embedding: vector of size 768), Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_1', 'number': 2, 'page': '100'}, embedding: vector of size 768)] == [Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768), Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'name': 'name_2', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768), Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'name': 'name_1', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)]
At index 0 diff: Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768) != Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)
Full diff:
[
- Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
- Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'name': 'name_2', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
- Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'name': 'name_1', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
+ Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768),
+ Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_2', 'number': 2, 'page': '100'}, embedding: vector of size 768),
+ Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_1', 'number': 2, 'page': '100'}, embedding: vector of size 768),
]
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_compound_async - AssertionError: assert [Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768)] == [Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)]
At index 0 diff: Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768) != Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)
Full diff:
[
- Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
+ Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768),
]
I get these failures. Can you run tests locally and fix them? Ping me when works.
…equal Pinecone stores embedding vectors as float32 internally, so values round-tripped through the store may differ from the original float64 values in filterable_docs. Document.__eq__ uses to_dict() which includes the embedding list, causing test_filter_simple_async and test_filter_compound_async to fail even when all other fields match. Fix: compare documents sorted by ID with embedding fields stripped, verifying only that embedding presence (None vs non-None) matches. Exact float values are not the invariant under test in filter tests.
|
Fixed the lint error (B905: |
|
I did some final improvements. Now running all tests locally before merging. |
Related Issues
Proposed Changes:
Replaces 17 duplicate async test methods in
TestDocumentStoreAsyncwith the generic mixin classes fromhaystack.testing.document_store_async(available since haystack-ai 2.28.0).Mixins added:
WriteDocumentsAsyncTestCountDocumentsAsyncTestFilterDocumentsAsyncTestCountDocumentsByFilterAsyncTestCountUniqueMetadataByFilterAsyncTestDeleteDocumentsAsyncTestDeleteAllAsyncTestDeleteByFilterAsyncTestUpdateByFilterAsyncTestGetMetadataFieldsInfoAsyncTestGetMetadataFieldMinMaxAsyncTestGetMetadataFieldUniqueValuesAsyncTestPreserved (Pinecone-specific):
test_embedding_retrieval— exercises_embedding_retrieval_asyncwith real vectorstest_close— verifies the async index lifecycle (close_async/ re-initialize)test_sentence_window_retriever— end-to-end retrieval withSentenceWindowRetrieverVersion bump:
haystack-ai>=2.26.1→>=2.28.0(required forhaystack.testing.document_store_asyncmodule).Note:
test_count_not_empty_asyncis overridden to work around a missing@staticmethodon the mixin method in haystack v2.28.0 (fixed in haystack main, tracked in deepset-ai/haystack#11174).How did you test it?
pytest -v -m integration)Notes for the reviewer
The class-level
document_storefixture delegates to the conftest'sdocument_store_asyncfixture (which already wrapswrite_documents_async,delete_documents_async, anddelete_all_documents_asyncwithasyncio.sleepfor Pinecone's eventual consistency). This keeps the sleep behaviour intact for all mixin tests without duplicating fixture code.Checklist
refactor: