Skip to content

refactor(pinecone): make get_metadata_field_min_max more consistent; use async DocumentStore mixin tests#3231

Merged
anakin87 merged 5 commits into
deepset-ai:mainfrom
Aftabbs:refactor/pinecone-async-mixin-tests
May 8, 2026
Merged

refactor(pinecone): make get_metadata_field_min_max more consistent; use async DocumentStore mixin tests#3231
anakin87 merged 5 commits into
deepset-ai:mainfrom
Aftabbs:refactor/pinecone-async-mixin-tests

Conversation

@Aftabbs
Copy link
Copy Markdown
Contributor

@Aftabbs Aftabbs commented Apr 26, 2026

Related Issues

Proposed Changes:

Replaces 17 duplicate async test methods in TestDocumentStoreAsync with the generic mixin classes from haystack.testing.document_store_async (available since haystack-ai 2.28.0).

Mixins added:

  • WriteDocumentsAsyncTest
  • CountDocumentsAsyncTest
  • FilterDocumentsAsyncTest
  • CountDocumentsByFilterAsyncTest
  • CountUniqueMetadataByFilterAsyncTest
  • DeleteDocumentsAsyncTest
  • DeleteAllAsyncTest
  • DeleteByFilterAsyncTest
  • UpdateByFilterAsyncTest
  • GetMetadataFieldsInfoAsyncTest
  • GetMetadataFieldMinMaxAsyncTest
  • GetMetadataFieldUniqueValuesAsyncTest

Preserved (Pinecone-specific):

  • test_embedding_retrieval — exercises _embedding_retrieval_async with real vectors
  • test_close — verifies the async index lifecycle (close_async / re-initialize)
  • test_sentence_window_retriever — end-to-end retrieval with SentenceWindowRetriever

Version bump: haystack-ai>=2.26.1>=2.28.0 (required for haystack.testing.document_store_async module).

Note: test_count_not_empty_async is overridden to work around a missing @staticmethod on the mixin method in haystack v2.28.0 (fixed in haystack main, tracked in deepset-ai/haystack#11174).

How did you test it?

  • Integration tests run against a live Pinecone index via CI (pytest -v -m integration)
  • Mixin-inherited tests provide equivalent coverage for the removed methods
  • Pinecone-specific tests (embedding, close lifecycle, sentence window) still run

Notes for the reviewer

The class-level document_store fixture delegates to the conftest's document_store_async fixture (which already wraps write_documents_async, delete_documents_async, and delete_all_documents_async with asyncio.sleep for Pinecone's eventual consistency). This keeps the sleep behaviour intact for all mixin tests without duplicating fixture code.

Checklist

Replaces 17 duplicate async test methods in TestDocumentStoreAsync with
generic mixin classes from haystack.testing.document_store_async
(available since haystack-ai 2.28.0).

Mixins added:
- WriteDocumentsAsyncTest
- CountDocumentsAsyncTest
- FilterDocumentsAsyncTest
- CountDocumentsByFilterAsyncTest
- CountUniqueMetadataByFilterAsyncTest
- DeleteDocumentsAsyncTest
- DeleteAllAsyncTest
- DeleteByFilterAsyncTest
- UpdateByFilterAsyncTest
- GetMetadataFieldsInfoAsyncTest
- GetMetadataFieldMinMaxAsyncTest
- GetMetadataFieldUniqueValuesAsyncTest

Preserved (Pinecone-specific):
- test_embedding_retrieval
- test_close
- test_sentence_window_retriever

Version bump: haystack-ai>=2.26.1 -> >=2.28.0 (required for the
haystack.testing.document_store_async module).
@Aftabbs Aftabbs requested a review from a team as a code owner April 26, 2026 08:58
@Aftabbs Aftabbs requested review from anakin87 and removed request for a team April 26, 2026 08:58
@github-actions github-actions Bot added integration:pinecone type:documentation Improvements or additions to documentation labels Apr 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

Coverage report (pinecone)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/pinecone/src/haystack_integrations/document_stores/pinecone
  document_store.py 869, 872-873
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the integration tests locally (they do not run on PRs from forks for security reasons)

I got the following errors

FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_get_metadata_field_min_max_empty_collection_async - ValueError: No values found for metadata field 'priority'
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_delete_documents_empty_document_store_async - pinecone.exceptions.exceptions.NotFoundException: (404)
Reason: Not Found
HTTP response headers: <CIMultiDictProxy('Date': 'Mon, 27 Apr 2026 12:56:18 GMT', 'Content-Type': 'application/json', 'Content-Length': '55', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '25', 'x-envoy-upstream-service-time': '25', 'x-pinecone-response-duration-ms': '26', 'Server': 'envoy')>
HTTP response body: {"code":5,"message":"Namespace not found","details":[]}
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_simple_async - AssertionError
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_compound_async - AssertionError
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_async - NotImplementedError: Default write_documents_async() behaviour depends on the Document Store implementation, as we don't enforce a default behaviour when no policy is set. Override this test in your custom test class.
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_duplicate_fail_async - Failed: DID NOT RAISE <class 'haystack.document_stores.errors.errors.DuplicateDocumentError'>
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_write_documents_duplicate_skip_async - AssertionError

I suggest you also run them locally (a free Pinecone API key is needed) and update this PR to make them pass

…ic behaviour

Seven integration tests from the mixin failed when anakin87 ran them locally:

- test_write_documents_async: NotImplementedError — the mixin intentionally
  raises this to force each store to define its default write behaviour.
  Overridden to match the sync test (upsert one doc, assert count == 1).

- test_write_documents_duplicate_fail_async / _skip_async: Pinecone only
  supports UPSERT, so DuplicatePolicy.FAIL/SKIP have no effect. Skipped for
  the same reason as the sync equivalents.

- test_delete_documents_empty_document_store_async: Pinecone raises
  NotFoundException when deleting from a namespace that has never received a
  write. Skipped to match the sync skip.

- test_get_metadata_field_min_max_empty_collection_async: the mixin expects
  {min: None, max: None} but Pinecone raises ValueError("No values found…").
  Overridden to assert the ValueError, consistent with the sync override.

- test_filter_simple_async / test_filter_compound_async: filter_documents_async
  uses a dummy-vector similarity search internally, so documents are returned
  in similarity order rather than insertion order. assert_documents_are_equal
  is overridden to sort both lists by document id before comparing.
@github-actions
Copy link
Copy Markdown
Contributor

Heads-up for maintainers

This PR is from a fork and touches integrations whose integration tests require API keys.
Those tests are skipped in CI because fork PRs don't have access to repo secrets for security reasons.

Affected integrations:

  • pinecone

Please run the integration tests locally (hatch run test:integration inside each folder) before approving.

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_simple_async - AssertionError: assert [Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768), Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_2', 'number': 2, 'page': '100'}, embedding: vector of size 768), Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_1', 'number': 2, 'page': '100'}, embedding: vector of size 768)] == [Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768), Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'name': 'name_2', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768), Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'name': 'name_1', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)]

  At index 0 diff: Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768) != Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)

  Full diff:
    [
  -     Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
  -     Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'name': 'name_2', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
  -     Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'name': 'name_1', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
  +     Document(id=1a46c9a3f25735c2d8402227a895d59168b60004ea88bcdebaae45e4c64dc5db, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768),
  +     Document(id=d919935b9a5b055ccc496154697127f40888631a1d5dc776b3c620fb843d7f09, content: 'A Foo Document 2', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_2', 'number': 2, 'page': '100'}, embedding: vector of size 768),
  +     Document(id=f3b69f5129585bbd1c529979ea301d812fcbab89e25b1a55d2b2c91afac0ab83, content: 'A Foo Document 1', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_1', 'number': 2, 'page': '100'}, embedding: vector of size 768),
    ]
FAILED tests/test_document_store_async.py::TestDocumentStoreAsync::test_filter_compound_async - AssertionError: assert [Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768)] == [Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)]

  At index 0 diff: Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768) != Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768)

  Full diff:
    [
  -     Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'name': 'name_0', 'page': '100', 'chapter': 'intro', 'number': 2, 'date': '1969-07-21T20:17:40'}, embedding: vector of size 768),
  +     Document(id=8dc93d7bb8817ed727b8fb0fabfbed6a5c130f7c4d4df05c5615eb7ce6b7bcec, content: 'A Foo Document 0', meta: {'chapter': 'intro', 'date': '1969-07-21T20:17:40', 'name': 'name_0', 'number': 2, 'page': '100'}, embedding: vector of size 768),
    ]

I get these failures. Can you run tests locally and fix them? Ping me when works.

Aftabbs added 2 commits May 7, 2026 08:13
…equal

Pinecone stores embedding vectors as float32 internally, so values round-tripped
through the store may differ from the original float64 values in filterable_docs.
Document.__eq__ uses to_dict() which includes the embedding list, causing
test_filter_simple_async and test_filter_compound_async to fail even when all
other fields match.

Fix: compare documents sorted by ID with embedding fields stripped, verifying
only that embedding presence (None vs non-None) matches. Exact float values are
not the invariant under test in filter tests.
@Aftabbs
Copy link
Copy Markdown
Contributor Author

Aftabbs commented May 7, 2026

Fixed the lint error (B905: zip() without strict= parameter) in commit 19cdde5. The assert_documents_are_equal override now also handles float32 precision loss (commit 6414430) — Pinecone stores embeddings as float32, so round-tripped values differ from the float64 originals; the override checks only embedding presence rather than exact values. Ready for re-review.

@anakin87 anakin87 self-assigned this May 7, 2026
Comment thread integrations/pinecone/tests/test_document_store_async.py
@anakin87 anakin87 changed the title refactor(pinecone): use async DocumentStore mixin tests refactor(pinecone): make get_metadata_field_min_max more consistent; use async DocumentStore mixin tests May 8, 2026
@anakin87
Copy link
Copy Markdown
Member

anakin87 commented May 8, 2026

I did some final improvements. Now running all tests locally before merging.

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@anakin87 anakin87 merged commit 39364d2 into deepset-ai:main May 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:pinecone type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use async DocumentStore mixin tests in Pinecone

2 participants