Skip to content

Create Vector Functions#2770

Draft
Dakantz wants to merge 42 commits intoad-freiburg:masterfrom
Dakantz:master
Draft

Create Vector Functions#2770
Dakantz wants to merge 42 commits intoad-freiburg:masterfrom
Dakantz:master

Conversation

@Dakantz
Copy link
Copy Markdown

@Dakantz Dakantz commented Mar 11, 2026

This pull request adds new Dense Vector functions according to the rdf-tensor spec. It is currently under construction, but basic operations like addition, subtraction, and cosine similarity work already.

TODOs

  • Test more extenseively (currently only basic tests with building and query construction)
  • Add a custom AKNN index to speed up processing
  • Add aggregrate functions -- any idea on how to add IRI-based aggregate functions like 'normal' functions?
  • Benchmark

@hannahbast
Copy link
Copy Markdown
Member

hannahbast commented Mar 12, 2026

@Dakantz Thank you for this proof of concept. Can you provide a short high-level description of what you did. In particular:

  1. How are the embeddings stored in QLever and associated with entities? Via triples of the kind ?subject :has_embedding "... embedding ..."^^<suitable embedding type>?
  2. How are the embeddings computed and from which data?
  3. When are the embeddings computed?

There are many ways to do this, and when we integrate this into QLever, it's important to make the right decision here. Note that we are also currently working on this. Tagging @bastiscode

@Dakantz
Copy link
Copy Markdown
Author

Dakantz commented Mar 17, 2026

Hey!

The tensors are stored per the spec mentioned above, i.e.

"{\"type\": \"float32\", \"shape\": [3, 2], \"data\": [0.1, 1.2, 2.2, 3.2, 4.1, 5.4e2]}"^^tensor:DataTensor

The PR only supports float tensors, which are the usual choice, but the system could be easily be extended to other data types using templating.

They are computed outside of qlever (which is the sensible option in the my view -- no need to make an opinionated choice on the model/compute framework within the database).

And a query could look like this:

PREFIX dt: <https://w3id.org/rdf-tensor/datatypes#>
PREFIX dtf: <https://w3id.org/rdf-tensor/functions#>
PREFIX dta: <https://w3id.org/rdf-tensor/aggregates#>
  SELECT ?s (dtf:cosineSimilarity("{\"data\":[1.0,2.0,3.0],\"shape\":[3],\"type\":\"float64\"}"^^dt:DataTensor, ?v) AS ?sim) ?v WHERE  { 
  ?s <p1> ?v.
  }
  ORDER BY DESC(?sim)      
  )

The vectors are currently parsed as they are processed, but I am working on an index implementation using annoy that follows similar principles to the virtual SERVICE system already employed in other parts of qlever, such as in the spatial joins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants