Skip to content

SQL vector support across JDBC/R2DBC with dialect-aware scoring#3637

Open
n0tl3ss wants to merge 109 commits into5.0.xfrom
support-vector-type-oracledb
Open

SQL vector support across JDBC/R2DBC with dialect-aware scoring#3637
n0tl3ss wants to merge 109 commits into5.0.xfrom
support-vector-type-oracledb

Conversation

@n0tl3ss
Copy link
Copy Markdown
Member

@n0tl3ss n0tl3ss commented Dec 8, 2025

  • Adds end-to-end vector capabilities across Oracle/PostgreSQL/MySQL in data-model, data-runtime, data-jdbc, and data-r2dbc, including vector types, converters, binders/mappers, and dialect-specific schema/index generation.
  • Extends query support with vector scoring/search integration and dialect-aware behavior, plus stricter validation around scoring-function usage and parameter handling.
  • Expands coverage and usage examples significantly (tests + jdbc-vector doc examples + docs updates) to validate CRUD/search/index flows and sparse/dense vector handling across supported dialects.

@dstepanov
Copy link
Copy Markdown
Contributor

Can you try to implement it without changes in DataType and get/set mapping?

The idea would be to use DataType.OBJECT and use a custom AttributeConverter where the context is JdbcConversionContext/R2dbcConversionContext there we should probably also add the current dialect; by that we can fetch some dialect specific vector helper. (See PostgresEnumsSpec as an example to use it).
The vector data classes should be annotated with @TypeDef. We might want to introduce a new runtime interface that can also be implemented by AttributeConverter to let the schema generator know what type to put.

The idea would be to implement it in almost non-invasive way so other things like geo etc can be supported as well easily.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 205 out of 223 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

data-r2dbc/src/main/java/io/micronaut/data/r2dbc/config/R2dbcSchemaGenerator.java:148

  • generate() logs every CREATE TABLE statement at WARN level (LOG.warn("Create table..." )), which will spam logs during normal application startup/tests even when nothing is wrong. Please downgrade this to DEBUG/TRACE (or remove it) and rely on the existing DataSettings.QUERY_LOG debug logging for visibility.

@@ -0,0 +1,31 @@
/*
* Copyright 2017-2021 original authors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could update year since it may be confusing since it is since 5.0.x (there are other classes in the package to be updated)

Copilot AI review requested due to automatic review settings April 1, 2026 09:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 205 out of 223 changed files in this pull request and generated 3 comments.

Comment on lines +35 to +45
public record FloatVector(float[] data) implements Vector {


/**
* Creates a float vector.
*
* @param data the backing values
*/
public FloatVector {
Objects.requireNonNull(data, "FloatVector data must not be null");
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These vector records expose the backing array directly via the record component accessor (data()), and the canonical constructor does not defensively copy the input array. That makes the type effectively mutable (callers can mutate the array after construction, or through data()), breaking the "immutable vector" contract and potentially invalidating equals/hashCode. Consider defensively copying in the canonical constructor and overriding data() to return a copy (or avoid using a record for array-backed immutables).

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +45
public record DoubleVector(double[] data) implements Vector {


/**
* Creates a double vector.
*
* @param data the backing values
*/
public DoubleVector {
Objects.requireNonNull(data, "DoubleVector data must not be null");
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoubleVector has the same mutability issue as FloatVector: the record component accessor returns the internal double[] and the constructor doesn't copy the input array. This allows external mutation and can break the immutability and equals/hashCode expectations. Defensively copy in the constructor and override data() to return a copy (or switch away from records for array-backed values).

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +45
public record ByteVector(byte[] data) implements Vector {


/**
* Creates a byte vector.
*
* @param data the backing values
*/
public ByteVector {
Objects.requireNonNull(data, "ByteVector data must not be null");
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteVector is declared as immutable but, as a record, it exposes its internal byte[] via the generated data() accessor and does not defensively copy the array in the constructor. Callers can mutate the backing storage, which can break equality/hashCode and lead to subtle bugs. Defensively copy the array on construction and override data() to return a copy (or avoid using a record for array-backed types).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 1, 2026 11:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 205 out of 223 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (7)

data-r2dbc/build.gradle:1

  • The dependency alias mnSql.mysql.connector.j looks truncated and is very likely not defined in the version catalog (previously mnSql.mysql.connector.java). If the alias doesn't exist, this will break Gradle resolution across multiple modules—please restore the correct alias or add/rename the catalog entry consistently.
    data-r2dbc/build.gradle:1
  • The dependency alias mnSql.mysql.connector.j looks truncated and is very likely not defined in the version catalog (previously mnSql.mysql.connector.java). If the alias doesn't exist, this will break Gradle resolution across multiple modules—please restore the correct alias or add/rename the catalog entry consistently.
    doc-examples/r2dbc-example-kotlin/src/main/resources/application.yml:1
  • The container tag container-registry.oracle.com/mysql/community-server:9.6.0 is likely not a valid MySQL Server version/tag and may fail pulls in CI. Consider using a known published MySQL tag (e.g., an 8.4.x LTS tag) or the existing mysql:<version> image used elsewhere, and avoid registries that require authentication unless that’s explicitly handled.
    data-runtime/src/main/java/io/micronaut/data/runtime/operations/internal/sql/SqlJsonColumnMapperProvider.java:1
  • This changes both the constructor signature and reduces its visibility (from public to package-private). Since the class itself is public, this is a source/binary compatibility break for any external code constructing it. If the intent is DI-only wiring, keep the constructor public (or add a public delegating constructor overload) while adding the new dependency.
    data-processor/src/main/java/io/micronaut/data/processor/visitors/finders/FindersUtils.java:1
  • SearchResults<T> represents a multi-row result container, but this branch routes it through FindOne* interceptors (sync/reactive/async). That is very likely to only read/map a single row and silently drop the rest. A safer approach is to route SearchResults through an interceptor that iterates all rows (similar to FindAll*) and then wraps them into SearchResults.
    data-runtime/src/main/java/io/micronaut/data/runtime/operations/internal/query/DefaultBindableParametersStoredQuery.java:1
  • This uses Java record deconstruction patterns (value instanceof Score(double score) / Similarity(double similarity)), which require Java 21+. If the project/toolchain baseline is below 21, this will not compile; use standard instanceof Score score and score.value() accessors instead (or ensure the build enforces Java 21+).
    data-r2dbc/src/main/java/io/micronaut/data/r2dbc/mapper/ColumnNameR2dbcResultReader.java:1
  • getRequiredValue now returns null when conversion is not possible (convert(...).orElse(null)), which can mask type/codec issues and violates the usual expectation implied by 'Required'. Consider throwing a DataAccessException when conversion returns empty (or using a convertRequired-style path) so mapping failures are surfaced deterministically.

Copilot AI review requested due to automatic review settings April 2, 2026 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 203 out of 221 changed files in this pull request and generated 8 comments.

Comment on lines 639 to 643

// LocalDate
conversionService.addConverter(LocalDate.class, java.sql.Date.class, java.sql.Date::valueOf);
conversionService.addConverter(LocalDate.class, Date.class, localDate -> Date.from(localDate.atStartOfDay(ZoneId.systemDefault()).toInstant()));

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a ChronoLocalDate -> java.util.Date converter registered earlier in this factory. Adding a second LocalDate -> Date converter is redundant (LocalDate implements ChronoLocalDate) and can make converter selection harder to reason about. Prefer relying on the existing ChronoLocalDate converter, or document why the more specific converter is necessary.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +46
// Ensure pgvector extension and demo table for vector tests
try (CallableStatement st = connection.prepareCall("CREATE EXTENSION IF NOT EXISTS vector;")) {
st.execute();
} catch (SQLException e) {
// Ignore if not available or already exists
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initializer is meant to ensure the vector extension exists for vector specs, but it currently swallows any SQLException from CREATE EXTENSION .... If the extension is missing or permissions prevent creation, tests will fail later with harder-to-diagnose errors. Consider letting the exception propagate (or at least logging) for this vector-specific init path.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 3, 2026 09:58
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 204 out of 222 changed files in this pull request and generated no new comments.

@graemerocher graemerocher moved this to Ready for review in 5.0.0 Release Apr 3, 2026
@graemerocher graemerocher added the type: enhancement New feature or request label Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request

Projects

Status: Ready for review

Development

Successfully merging this pull request may close these issues.

6 participants