Skip to content

feat: add new falkordb integration#3158

Merged
bogdankostic merged 38 commits intodeepset-ai:mainfrom
ghassenzaara:feature/falkordb-integration
May 4, 2026
Merged

feat: add new falkordb integration#3158
bogdankostic merged 38 commits intodeepset-ai:mainfrom
ghassenzaara:feature/falkordb-integration

Conversation

@ghassenzaara
Copy link
Copy Markdown
Contributor

@ghassenzaara ghassenzaara commented Apr 13, 2026

Related Issues

Proposed Changes:

  • Added FalkorDBDocumentStore to connect Haystack with FalkorDB graph databases.
  • Added FalkorDBEmbeddingRetriever for standard vector searches.
  • Added FalkorDBCypherRetriever for running custom GraphRAG Cypher queries.
  • Ensured document metadata is flattened and stored directly on the graph nodes.
  • Fixed vector insertion by casting arrays with vecf32() in Cypher queries.

How did you test it?

  • Unit tests: Added basic component and serialization tests (hatch run test:unit).
  • Integration tests: Verified writes, vector searches, and duplicate policies against a live database (hatch run test:integration).
  • Linters: Passed all type-checking and formatting checks (hatch run test:types, hatch run fmt).

Notes for the reviewer

  • Note the vecf32() explicit cast in the UNWIND cypher queries. This is specifically required by FalkorDB to parse vector embeddings correctly.

Checklist

@ghassenzaara ghassenzaara requested a review from a team as a code owner April 13, 2026 15:19
@ghassenzaara ghassenzaara requested review from davidsbatista and removed request for a team April 13, 2026 15:19
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 13, 2026
@ghassenzaara ghassenzaara force-pushed the feature/falkordb-integration branch from c580421 to d380366 Compare April 14, 2026 09:18
@julian-risch julian-risch removed the request for review from davidsbatista April 15, 2026 13:29
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this PR, @ghassenzaara! Great work so far. You'll see quite a few comments below, but they are mostly just minor formatting improvements for the docstrings.

- "Test / dspy"
- "Test / elasticsearch"
- "Test / faiss"
- "Test / falkor_db"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency (for example with arcadedb) let's use falkordb throughout this integration instead of falkor_db, so changing for example also integrations/falkor_db -> integrations/falkordb.

Comment thread .github/labeler.yml Outdated
- any-glob-to-any-file: ".github/workflows/faiss.yml"


integration:falkor-db:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
integration:falkor-db:
integration:falkordb:

documented_only: true
skip_empty_modules: true
renderer:
description: FalkorDB integration for Haystack — GraphRAG document store, embedding retriever, and Cypher retriever
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: FalkorDB integration for Haystack — GraphRAG document store, embedding retriever, and Cypher retriever
description: FalkorDB integration for Haystack

"""
Retrieve documents by executing an OpenCypher query.

If a ``query`` is provided here, it overrides the ``custom_cypher_query``
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use single backticks inside docstrings for inline code.

Suggested change
If a ``query`` is provided here, it overrides the ``custom_cypher_query``
If a `query` is provided here, it overrides the `custom_cypher_query`

Comment on lines +600 to +612
Translate a Haystack filter dict into an OpenCypher ``WHERE`` sub-expression.

Supports the full Haystack filter DSL:

- Logical: ``AND``, ``OR``, ``NOT``
- Comparison: ``==``, ``!=``, ``>``, ``>=``, ``<``, ``<=``
- Membership: ``in``, ``not in``

All values are passed as named query parameters to prevent injection.

:param filters: A Haystack filter dictionary.
:returns: Tuple of ``(where_clause_string, params_dict)``.
:raises ValueError: If an unsupported operator or malformed filter is provided.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Translate a Haystack filter dict into an OpenCypher ``WHERE`` sub-expression.
Supports the full Haystack filter DSL:
- Logical: ``AND``, ``OR``, ``NOT``
- Comparison: ``==``, ``!=``, ``>``, ``>=``, ``<``, ``<=``
- Membership: ``in``, ``not in``
All values are passed as named query parameters to prevent injection.
:param filters: A Haystack filter dictionary.
:returns: Tuple of ``(where_clause_string, params_dict)``.
:raises ValueError: If an unsupported operator or malformed filter is provided.
Translate a Haystack filter dict into an OpenCypher `WHERE` sub-expression.
Supports the full Haystack filter DSL:
- Logical: `AND`, `OR`, `NOT`
- Comparison: `==`, `!=`, `>`, `>=`, `<`, `<=`
- Membership: `in`, `not in`
All values are passed as named query parameters to prevent injection.
:param filters: A Haystack filter dictionary.
:returns: Tuple of `(where_clause_string, params_dict)`.
:raises ValueError: If an unsupported operator or malformed filter is provided.

Comment thread integrations/falkor_db/pyproject.toml Outdated
build-backend = "hatchling.build"

[project]
name = "falkor-db-haystack"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name = "falkor-db-haystack"
name = "falkordb-haystack"

Comment thread CLAUDE.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should be reverted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use our DocumentStoreBaseTests for testing the document store as described in our docs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a sentence here saying that in order to run the integration tests, a docker container needs to be run, similar to how we do for example for opensearch

@ghassenzaara
Copy link
Copy Markdown
Contributor Author

I re-requested a review by accident.
Thanks for the review; it's my first time contributing to open-source. I will pay attention to the documentation carefully and work on the requested changes.

@bogdankostic
Copy link
Copy Markdown
Contributor

I re-requested a review by accident. Thanks for the review; it's my first time contributing to open-source. I will pay attention to the documentation carefully and work on the requested changes.

No worries, let me know if there's anything you're unsure about.

… or falkor-db to falkordb for consistency, remove useless implementation, fix other small issues
@ghassenzaara
Copy link
Copy Markdown
Contributor Author

Hi @bogdankostic, I've addressed all the feedback from the previous review. The changes should now align with Haystack's integration conventions and requirements. Please let me know if anything else needs to be adjusted. Happy to iterate further!

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 23, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedfalkordb@​1.6.0100100100100100

View full report

@davidsbatista
Copy link
Copy Markdown
Contributor

I've created a follow up issue 3219 to add the extended operations to FalkorDBDocumentStore

@ghassenzaara
Copy link
Copy Markdown
Contributor Author

Hey @davidsbatista,
I've pushed the requested changes and this is ready for another look! Regarding the follow-up issue #3219, I'd love to take that on next. Should I wait for this PR to be merged into main so I can start from a fresh branch, or would you prefer I start working on it now by branching off of this one?

Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @ghassenzaara! I think the PR is on a good way, we should just remove files that shouldn't be included in the PR and fix the sorting and scaling of scores.

Regarding #3219, I'd say let's wait for this PR to be merged so that we can be sure there won't be any major changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is probably a residue from testing the integration and can be removed.


from haystack_integrations.document_stores.falkordb.document_store import (
FalkorDBDocumentStore,
SimilarityFunction,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to expose SimilarityFunction here.

UNWIND $docs AS doc
MERGE (d:{self.node_label} {{id: doc.id}})
ON CREATE SET d += doc
ON MATCH SET d += doc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using += here keeps the properties from the overwritten document that are not present in the new document with the same ID, so we should use = here instead.

Suggested change
ON MATCH SET d += doc
ON MATCH SET d = doc

Comment on lines +546 to +552
record = {
"id": doc.id,
"content": doc.content,
"embedding": doc.embedding,
}
if doc.meta:
record.update(doc.meta)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta fields can silently overwrite standard Document fields (id, content, embedding) due to the update order, let's make sure that these are not overwritten by meta fields.

Suggested change
record = {
"id": doc.id,
"content": doc.content,
"embedding": doc.embedding,
}
if doc.meta:
record.update(doc.meta)
record = {}
if doc.meta:
record.update(doc.meta)
record["id"] = doc.id
record["content"] = doc.content
record["embedding"] = doc.embedding

YIELD node AS d, score
WHERE {where_clause}
RETURN d, score
ORDER BY score DESC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked deeper into FalkorDB and it seems that the database is not returning similarity scores but embedding distances, so we should ORDER BY score ASC here - sorry for the wrong comment earlier.

:returns: Scaled score in `[0, 1]`.
"""
if self.similarity == "cosine":
return (score + 1) / 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thisformula assumes the raw score is cosine similarity in range [-1, 1], but the raw score is cosine distance (1 - cos_sim, range [0, 2]), we should therefore adapt the scaling:

Suggested change
return (score + 1) / 2
return 1 - (score / 2)

"""
if self.similarity == "cosine":
return (score + 1) / 2
return float(1 / (1 + math.exp(-score / 100)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw score is euclidean distance in range [0, ∞). Plugging into this sigmoid:

  • distance=0 (perfect match) → 0.5,
  • distance→∞ (terrible) → ~1.0

Bad matches get higher scaled scores than good ones, the mapping is inverted.

Let's replace with a monotonically decreasing transform:

Suggested change
return float(1 / (1 + math.exp(-score / 100)))
return 1 / (1 + score)

@bogdankostic bogdankostic self-assigned this Apr 27, 2026
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ghassenzaara! The PR is almost good to go, I just added some minor comments about adapting the doc strings to the adapted scoring formulas and it would be nice to also have an integration test for the retrievers.

Comment on lines +510 to +517
Scale a raw similarity score to the unit interval `[0, 2]`.

Uses the following formulas:
- Cosine: `1 - (score / 2)`
- Euclidean: sigmoid `1 / (1 + score)`

:param score: Raw score returned by the vector index.
:returns: Scaled score in `[0, 2]`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix regarding the limits of the unit interval.

Suggested change
Scale a raw similarity score to the unit interval `[0, 2]`.
Uses the following formulas:
- Cosine: `1 - (score / 2)`
- Euclidean: sigmoid `1 / (1 + score)`
:param score: Raw score returned by the vector index.
:returns: Scaled score in `[0, 2]`.
Scale a raw similarity score to the unit interval `[0, 1]`.
Uses the following formulas:
- Cosine: `1 - (score / 2)`
- Euclidean: sigmoid `1 / (1 + score)`
:param score: Raw score returned by the vector index.
:returns: Scaled score in `[0, 1]`.

Comment on lines +435 to +437
Cosine scores are returned in `[-1, 1]`; when `scale_score=True` they are
scaled to `[0, 1]` using the formula:
`(score + 1) / 2`. Euclidean scores are transformed with a sigmoid.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring should be adapted to reflect the adapted scaling.

Suggested change
Cosine scores are returned in `[-1, 1]`; when `scale_score=True` they are
scaled to `[0, 1]` using the formula:
`(score + 1) / 2`. Euclidean scores are transformed with a sigmoid.
Cosine scores are returned as distances in `[0, 2]`; when `scale_score=True` they are
scaled to `[0, 1]` using the formula:
`1 - (score / 2)`. Euclidean scores are transformed with `1 / (1 + score)`.

YIELD node AS d, score
WHERE {where_clause}
RETURN d, score
ORDER BY score ASC
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the else-case, let's add id here as well as secondary ordering field.

Suggested change
ORDER BY score ASC
ORDER BY score ASC, d.id ASC

Comment thread integrations/falkordb/tests/test_retrievers.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be moved one level up to integrations/falkordb/src/haystack_integrations/components/retrievers.

@ghassenzaara ghassenzaara requested a review from bogdankostic May 4, 2026 08:45
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ghassenzaara and congrats on your first contribution to Haystack 🚀

@bogdankostic bogdankostic merged commit 991b9fa into deepset-ai:main May 4, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:falkordb topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants