feat: Implement AlloyDB integration with comprehensive test coverage#3229
feat: Implement AlloyDB integration with comprehensive test coverage#3229davidsbatista merged 24 commits intodeepset-ai:mainfrom
Conversation
- Add `py.typed` file for type hinting support in AlloyDB document store. - Create initial test suite for AlloyDB integration, including fixtures for document stores. - Implement tests for document conversion functions between Haystack and PostgreSQL formats. - Develop extensive unit tests for the AlloyDB document store, covering CRUD operations and metadata handling. - Add filter tests to validate query capabilities of the AlloyDB document store. - Implement embedding retrieval tests for both cosine similarity and inner product methods. - Create keyword retrieval tests to ensure accurate document retrieval based on query strings. - Ensure all tests handle various edge cases and validate expected outcomes.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
davidsbatista
left a comment
There was a problem hiding this comment.
@garybadwal thanks for the contribution!
The PR looks already very good - I left a few comments, after being addressed I think we can merge.
|
@garybadwal did you had the chance to test the integration tests against a real google cloud AlloyDB instance? |
|
Not yet @davidsbatista but will do it this Saturday along with the comments that you mentioned. I just wrote the code and run the test cases. But I'm sure that this will work as I have worked with AlloyDB initially. Still I'll test the complete component with it. |
|
Let me than know when you have tested it against a google cloud AlloyDB instance - if it's all good we can then merge it and make an official release. |
|
Sure @davidsbatista will so these tests by tomorrow and will update you. |
How I testedThe integration uses the GCP Real bug found while running against an actual AlloyDB
The fix mirrors the pattern already used in the sister |
|
Hi @davidsbatista, did you got time to review the PR once ? |
|
@garybadwal It looks good. I've added a skip if env vars are not set and removed on test. |
|
Thanks, @davidsbatista. Let me know if you require any further changes or if it's good to merge. Excited to share my contribution to Haystack. |
davidsbatista
left a comment
There was a problem hiding this comment.
Looks good! Did a few adjustments!
Related Issues
Proposed Changes
Added a new
AlloyDBDocumentStore,AlloyDBEmbeddingRetriever, andAlloyDBKeywordRetrieverto support Google Cloud AlloyDB as a Haystack document store backend.pgvectorextension (cosine similarity, inner product, L2 distance).tsvector/tsquery.enable_iam_auth=True), with configurable IP type (PRIVATE,PUBLIC,PSC).ALLOYDB_INSTANCE_URI,ALLOYDB_USER, andALLOYDB_PASSWORDenvironment variables viaSecret.from_env_var.pgvectorintegration closely.How did you test it?
hatch run test:unit).hatch run fmt-check && hatch run test:types).@pytest.mark.integration(require a live AlloyDB instance withALLOYDB_INSTANCE_URI,ALLOYDB_USER,ALLOYDB_PASSWORDset).Notes for the reviewer
src/andtests/structure is a direct mirror ofintegrations/pgvector/— reviewers familiar with that integration should find it straightforward.Connectorobject is lazily initialized and reused across calls;close()/__del__ensure the background refresh thread is stopped cleanly.max-parallel: 1and noservicesblock — integration tests require a live GCP instance.Checklist
feat:in this case)