Conversation
📝 WalkthroughWalkthroughAdds per-index row-change tracking, SDK and API hooks to update that counter on vector insert/delete, a background reindexer that triggers Lance reindexing when thresholds are exceeded, Lance reindex utilities, config flags, and DB-cleaner updates to include vector collections. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SDK
participant BucketSpaceRPC as BucketSpace RPC
participant Server
participant Store as SystemStore
participant BGWorker
participant VUtils as VectorUtils
Client->>SDK: put_vectors(params)
SDK->>VUtils: put_vectors(...)
VUtils-->>SDK: result
SDK->>BucketSpaceRPC: add_rows_since_reindex({vector_bucket_name, vector_index_name, delta})
BucketSpaceRPC->>Server: update_rows_since_index(op: ADD, value: delta)
Server->>Store: locate vector_index & $inc rows_since_index
Store-->>Server: persisted
Server-->>BucketSpaceRPC: ack
BucketSpaceRPC-->>SDK: ack
SDK-->>Client: return result
par Periodic reindex check
BGWorker->>Store: scan vector_indices
alt rows_since_index >= VECTOR_INDEX_ROWS_REINDEX
BGWorker->>VUtils: reindex(vector_bucket, vector_index)
VUtils->>VUtils: LanceConn.reindex() recreate index
VUtils-->>BGWorker: done
BGWorker->>Server: update_rows_since_index(op: SET, value: 0)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/api/bucket_api.js`:
- Around line 1215-1232: The handler metadata for update_rows_since_index
currently forces admin-only RPC which causes post-write bookkeeping
(add_rows_since_reindex called after put_vectors/delete_vectors) to fail for
non-admin writers; either remove or relax the auth block on
update_rows_since_index so user-authenticated writers can call it (and enforce
bucket/index authorization inside the update_rows_since_index handler), or keep
the admin restriction but change the callers (put_vectors/delete_vectors) to
call add_rows_since_reindex using an internal/admin token; update the code paths
referencing update_rows_since_index and add_rows_since_reindex and ensure
authorization checks occur in the handler (validate
vector_bucket_name/vector_index_name) if you choose to allow user auth.
In `@src/endpoint/endpoint.js`:
- Around line 262-267: Guard registration of the VectorBucketsReindexer on the
presence of internal_rpc_client as well as the config flag: change the condition
that calls background_scheduler.register_bg_worker(new
VectorBucketsReindexer(...)) to require both internal_rpc_client and
config.REINDEX_VECTOR_BUCKETS (i.e., if (internal_rpc_client &&
config.REINDEX_VECTOR_BUCKETS) { ... }), ensuring the VectorBucketsReindexer is
never instantiated with client: internal_rpc_client when that client is
undefined.
In `@src/sdk/bucketspace_fs.js`:
- Around line 1467-1469: Implement the previously-noop method
add_rows_since_reindex to persistently increment the per-index
"rows_since_index" counter in the BucketSpaceFS backend: load the index metadata
for the given vector_bucket_name and vector_index_name (the same storage object
used by the existing FS config read/write helpers), validate and coerce the
incoming delta, add it to the stored rows_since_index (defaulting to 0 if
missing), then write the updated metadata back to disk; perform the
read-modify-write under the same serialized lock or atomic update pattern used
by config_fs to avoid races, and ensure this method is used by
VectorSDK.put_vectors and delete_vectors to trigger threshold-based reindexing.
In `@src/sdk/vector_sdk.js`:
- Around line 120-127: In put_vectors, move the bookkeeping call to
bs.add_rows_since_reindex so it happens before the actual vector mutation to
avoid reporting a failed write after Lance has accepted it: call
bs.add_rows_since_reindex(...) prior to invoking
vector_utils.put_vectors(this.req.vector_bucket, this.req.vector_index,
params.vectors); similarly apply the same change to the delete path that
currently calls add_rows_since_reindex after vector_utils.delete_vectors (refer
to the delete_vectors call and its add_rows_since_reindex usage) so counters are
incremented before performing the mutation.
In `@src/server/bg_services/db_cleaner.js`:
- Around line 132-151: The early-skip precheck omits vector_indices so deleted
vector indices never trigger cleanup; add a call to
system_store.count_total_docs('vector_indices') (e.g. const
total_vector_indices_count) alongside
total_accounts_count/total_buckets_count/total_vector_buckets_count/total_pools_count,
include total_vector_indices_count in the combined if condition that compares to
config.DB_CLEANER_MAX_TOTAL_DOCS, and update the dbg.log0 message (DB_CLEANER
skip message) to report total_vector_indices_count so the log reflects the full
set of stores being checked; note that system_store.find_deleted_docs already
requests 'vector_indices' so no change there.
In `@src/server/system_services/bucket_server.js`:
- Around line 2549-2568: In update_rows_since_index, avoid blindly using SET on
vector_indices[0].rows_since_index (which can overwrite concurrent ADDs);
instead implement a compare-and-set or use a subtracting increment so concurrent
increments are preserved: when req.rpc_params.op === 'SET' treat the value as
the number of processed rows and either (a) perform a conditional update that
sets rows_since_index only if the current stored value equals the
observed/expected value provided in req.rpc_params (CAS), or (b) convert the
reset into an $inc by -processedCount (so rows_since_index = rows_since_index -
processedCount) and call system_store.make_changes with that conditional/$inc
change; update the change object construction in update_rows_since_index
accordingly and ensure req.rpc_params carries the observed/expected value if
using CAS.
In `@src/server/system_services/schemas/vector_index_schema.js`:
- Around line 52-54: The rows_since_index schema allows negative values; update
the vector_index_schema.js definition for the rows_since_index field to enforce
non-negativity by adding a minimum constraint (and optionally a default of 0) so
the JSON/schema validator rejects negative counters—locate the rows_since_index
property in the exported schema and add a "minimum: 0" (and set "default: 0" if
desired) to ensure the counter cannot be negative.
In `@src/util/vectors_util.js`:
- Around line 282-287: In reindex, guard against missing or too-small Lance
tables: when calling get_table(table_name) inside reindex(vector_bucket,
vector_index), check for undefined and return/skip early if no table; when
calling table.createIndex('vector', {replace: true}), wrap it in a try/catch and
detect the known "Not enough rows" (or equivalent) error coming from
createIndex/put_vectors and treat it as non-fatal by logging/debugging and
returning without throwing so the background reindexer won't retry; keep error
handling generic enough to rethrow unexpected errors.
- Around line 362-366: The code dereferences nsr.nsfs_config.fs_root_path
without checking that nsr exists; when system_store lookup returns undefined
(stale SDK info or buckets created without namespace_resource) this causes a
TypeError—update the logic around vector_bucket.namespace_resource/resource and
nsr resolution (the nsr variable and lance_path assignment) to explicitly guard
that nsr is defined and has nsfs_config before computing lance_path, and if not
throw a clear configuration error (e.g., "Missing namespace_resource or
nsfs_config for vector bucket: <identifier>") so callers get a meaningful
message instead of a generic crash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f62823dc-2990-47b9-98fe-70fc9fc4c60a
📒 Files selected for processing (12)
config.jssrc/api/bucket_api.jssrc/endpoint/endpoint.jssrc/sdk/bucketspace_fs.jssrc/sdk/bucketspace_nb.jssrc/sdk/nb.d.tssrc/sdk/vector_sdk.jssrc/server/bg_services/db_cleaner.jssrc/server/bg_workers.jssrc/server/system_services/bucket_server.jssrc/server/system_services/schemas/vector_index_schema.jssrc/util/vectors_util.js
| async add_rows_since_reindex(/*{vector_bucket_name, vector_index_name, delta}*/) { | ||
| //TODO | ||
| } |
There was a problem hiding this comment.
Implement the FS reindex counter instead of silently no-oping.
VectorSDK.put_vectors() and delete_vectors() now rely on this method, so the TODO makes BucketSpaceFS-backed vector indexes never accumulate rows_since_index and therefore never trigger threshold-based reindexing.
I can help wire this through config_fs with a serialized read/update/write if you want.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/sdk/bucketspace_fs.js` around lines 1467 - 1469, Implement the
previously-noop method add_rows_since_reindex to persistently increment the
per-index "rows_since_index" counter in the BucketSpaceFS backend: load the
index metadata for the given vector_bucket_name and vector_index_name (the same
storage object used by the existing FS config read/write helpers), validate and
coerce the incoming delta, add it to the stored rows_since_index (defaulting to
0 if missing), then write the updated metadata back to disk; perform the
read-modify-write under the same serialized lock or atomic update pattern used
by config_fs to avoid races, and ensure this method is used by
VectorSDK.put_vectors and delete_vectors to trigger threshold-based reindexing.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
src/endpoint/endpoint.js (1)
262-267:⚠️ Potential issue | 🟠 MajorGuard reindexer registration on
internal_rpc_client.Line 265 can pass
undefinedwheninit_request_sdkis supplied externally or in deployments without an internal RPC client. Match the NamespaceMonitor guard above.Proposed fix
- if (config.REINDEX_VECTOR_BUCKETS) { + if (internal_rpc_client && config.REINDEX_VECTOR_BUCKETS) { background_scheduler.register_bg_worker(new VectorBucketsReindexer({ name: 'Vector_buckets_reindexer', client: internal_rpc_client, })); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/endpoint/endpoint.js` around lines 262 - 267, The VectorBucketsReindexer is being registered unconditionally when config.REINDEX_VECTOR_BUCKETS is true, but internal_rpc_client may be undefined; update the block around background_scheduler.register_bg_worker(new VectorBucketsReindexer(...)) to mirror the NamespaceMonitor guard by only registering the reindexer when internal_rpc_client is truthy (and config.REINDEX_VECTOR_BUCKETS is true), so that VectorBucketsReindexer is not constructed/passed an undefined client; reference the symbols VectorBucketsReindexer, background_scheduler.register_bg_worker, internal_rpc_client, and config.REINDEX_VECTOR_BUCKETS when making the check.src/util/vectors_util.js (2)
362-366:⚠️ Potential issue | 🟡 MinorGuard namespace-resource resolution before deriving
lance_path.A stale/missing
namespace_resource, unloadedsystems[0], or unresolved resource name currently turns into a generic TypeError at Line 366. Return a clear configuration error before dereferencingnsr.nsfs_config.Proposed guard
//for vc from system store, nsr is already an object. //for vc from sdk info, lookup nsr name in sysem store.namespace_resources_by_name - const nsr = typeof vector_bucket.namespace_resource.resource === 'object' ? vector_bucket.namespace_resource.resource : - system_store.data.systems[0].namespace_resources_by_name[vector_bucket.namespace_resource.resource]; + const resource = vector_bucket.namespace_resource?.resource; + const nsr = typeof resource === 'object' ? resource : + system_store.data.systems[0]?.namespace_resources_by_name?.[resource]; + if (!nsr?.nsfs_config?.fs_root_path) { + throw new Error(`Missing namespace_resource or nsfs_config for vector bucket ${vector_bucket.name.unwrap()}`); + } lance_path = nsr.nsfs_config.fs_root_path;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 362 - 366, The code dereferences nsr.nsfs_config without validating namespace resolution; in the block that computes nsr from vector_bucket.namespace_resource.resource and system_store.data.systems[0].namespace_resources_by_name, add guards to ensure system_store.data.systems exists and has index 0, that vector_bucket.namespace_resource and its resource are present, and that the resolved nsr is an object with nsfs_config and fs_root_path; if any check fails, throw/return a clear configuration error (e.g., "Missing or unresolved namespace_resource for vector_bucket" or similar) before assigning lance_path so the error is explicit instead of a TypeError.
282-287:⚠️ Potential issue | 🟠 MajorKeep background reindex from crashing on absent or too-small Lance tables.
get_table()may returnundefined, andcreateIndex()can hit the same “Not enough rows” path handled elsewhere. Skip these cases so the background worker does not fail-loop.Proposed guard
async reindex(vector_bucket, vector_index) { dbg.log0("reindex vector_bucket =", vector_bucket.name.unwrap(), ", vector_index =", vector_index.name.unwrap()); const table_name = vector_bucket.name.unwrap() + "_" + vector_index.name.unwrap(); const table = await this.get_table(table_name); - await table.createIndex('vector', {replace: true}); + if (!table) { + dbg.log1("reindex skipped, table not found =", table_name); + return; + } + try { + await table.createIndex('vector', {replace: true}); + } catch (err) { + if (err.message?.includes('Not enough rows')) { + dbg.log1("reindex skipped, not enough rows for table =", table_name); + return; + } + throw err; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 282 - 287, The reindex method can throw when get_table() returns undefined or when table.createIndex('vector') fails for small tables; modify reindex (in src/util/vectors_util.js) to first guard if table is falsy and simply return (or log and return), then wrap the await table.createIndex('vector', {replace: true}) call in a try/catch that catches the "Not enough rows" condition (match by error message or specific error class if available) and treats it as a no-op (log debug and return) while rethrowing unexpected errors; reference the reindex function, get_table call, and createIndex invocation when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/endpoint/endpoint.js`:
- Around line 262-267: The VectorBucketsReindexer is being registered
unconditionally when config.REINDEX_VECTOR_BUCKETS is true, but
internal_rpc_client may be undefined; update the block around
background_scheduler.register_bg_worker(new VectorBucketsReindexer(...)) to
mirror the NamespaceMonitor guard by only registering the reindexer when
internal_rpc_client is truthy (and config.REINDEX_VECTOR_BUCKETS is true), so
that VectorBucketsReindexer is not constructed/passed an undefined client;
reference the symbols VectorBucketsReindexer,
background_scheduler.register_bg_worker, internal_rpc_client, and
config.REINDEX_VECTOR_BUCKETS when making the check.
In `@src/util/vectors_util.js`:
- Around line 362-366: The code dereferences nsr.nsfs_config without validating
namespace resolution; in the block that computes nsr from
vector_bucket.namespace_resource.resource and
system_store.data.systems[0].namespace_resources_by_name, add guards to ensure
system_store.data.systems exists and has index 0, that
vector_bucket.namespace_resource and its resource are present, and that the
resolved nsr is an object with nsfs_config and fs_root_path; if any check fails,
throw/return a clear configuration error (e.g., "Missing or unresolved
namespace_resource for vector_bucket" or similar) before assigning lance_path so
the error is explicit instead of a TypeError.
- Around line 282-287: The reindex method can throw when get_table() returns
undefined or when table.createIndex('vector') fails for small tables; modify
reindex (in src/util/vectors_util.js) to first guard if table is falsy and
simply return (or log and return), then wrap the await
table.createIndex('vector', {replace: true}) call in a try/catch that catches
the "Not enough rows" condition (match by error message or specific error class
if available) and treats it as a no-op (log debug and return) while rethrowing
unexpected errors; reference the reindex function, get_table call, and
createIndex invocation when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 171beb2f-4687-4d5c-bde7-99b04252115e
📒 Files selected for processing (12)
config.jssrc/api/bucket_api.jssrc/endpoint/endpoint.jssrc/sdk/bucketspace_fs.jssrc/sdk/bucketspace_nb.jssrc/sdk/nb.d.tssrc/sdk/vector_sdk.jssrc/server/bg_services/db_cleaner.jssrc/server/bg_workers.jssrc/server/system_services/bucket_server.jssrc/server/system_services/schemas/vector_index_schema.jssrc/util/vectors_util.js
✅ Files skipped from review due to trivial changes (2)
- src/server/bg_workers.js
- src/sdk/vector_sdk.js
🚧 Files skipped from review as they are similar to previous changes (6)
- src/server/system_services/schemas/vector_index_schema.js
- src/sdk/bucketspace_fs.js
- src/sdk/bucketspace_nb.js
- src/server/system_services/bucket_server.js
- src/api/bucket_api.js
- src/server/bg_services/db_cleaner.js
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/bg_services/vector_buckets_reindexer.js`:
- Around line 29-30: In vector_buckets_reindexer.js replace the inconsistent
dbg.log call with dbg.log0 so logging uses the same verbosity level as the rest
of the module; locate the return path that currently calls dbg.log("Vector
Bucket Reindexer: no vector indices.") and change it to call dbg.log0 with the
same message to maintain consistent logging behavior in the Vector Bucket
Reindexer flow.
- Around line 33-53: The loop currently races and can lose progress: compute a
safe rows counter (e.g., const rows = vi.rows_since_index || 0) and only
consider reindex when rows >= config.VECTOR_INDEX_ROWS_REINDEX; wrap each
iteration in a try/catch so one failure doesn't stop the rest; call await
vector_utils.reindex(vi.vector_bucket, vi) (ensure you await the async reindex)
before updating/resetting the counter; when resetting use a decrement/INC
operation with the observed rows (call
this.client.bucket.update_rows_since_index with op: 'INC' and value: -rows) or
otherwise subtract the observed value instead of unconditionally SET 0 to avoid
dropping concurrent writes; use auth_server.make_auth_token and
system_store.data.systems[0] as before for the update call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ee8e48b2-3a88-40b7-aa0d-a7da9612ca80
📒 Files selected for processing (13)
config.jssrc/api/bucket_api.jssrc/endpoint/endpoint.jssrc/sdk/bucketspace_fs.jssrc/sdk/bucketspace_nb.jssrc/sdk/nb.d.tssrc/sdk/vector_sdk.jssrc/server/bg_services/db_cleaner.jssrc/server/bg_services/vector_buckets_reindexer.jssrc/server/bg_workers.jssrc/server/system_services/bucket_server.jssrc/server/system_services/schemas/vector_index_schema.jssrc/util/vectors_util.js
✅ Files skipped from review due to trivial changes (2)
- src/server/bg_workers.js
- src/server/system_services/schemas/vector_index_schema.js
🚧 Files skipped from review as they are similar to previous changes (7)
- src/sdk/nb.d.ts
- src/sdk/bucketspace_nb.js
- src/endpoint/endpoint.js
- src/sdk/bucketspace_fs.js
- config.js
- src/sdk/vector_sdk.js
- src/server/bg_services/db_cleaner.js
Signed-off-by: Amit Prinz Setter <alphaprinz@gmail.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/system_services/bucket_server.js (1)
2357-2360:⚠️ Potential issue | 🟠 MajorComplete the REST error mapping for
NO_SUCH_VECTOR_INDEX.The server now throws
NO_SUCH_VECTOR_INDEX, but the REST endpoint mapping is incomplete. Three changes are needed:
- Add
VectorError.NoSuchVectorIndextosrc/endpoint/vector/vector_errors.js- Map
NO_SUCH_VECTOR_INDEXto the newVectorError.NoSuchVectorIndexinRPC_ERRORS_TO_VECTORinsrc/endpoint/vector/vector_rest.js(lines 16-40)- Without these mappings, clients currently receive a fallback
InternalFailure(HTTP 500) error instead of the proper vector index not found error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/system_services/bucket_server.js` around lines 2357 - 2360, Add a new VectorError variant named NoSuchVectorIndex in the vector error definitions (VectorError.NoSuchVectorIndex) and export it from the vector_errors module, then update the RPC_ERRORS_TO_VECTOR mapping in the vector_rest module to map the RPC error string "NO_SUCH_VECTOR_INDEX" to VectorError.NoSuchVectorIndex so REST endpoints convert the RpcError('NO_SUCH_VECTOR_INDEX', ...) into the correct HTTP/vector error instead of falling back to InternalFailure.
♻️ Duplicate comments (3)
src/util/vectors_util.js (2)
282-287:⚠️ Potential issue | 🟠 MajorHandle non-indexable tables in background
reindex().
get_table()can returnundefined, andcreateIndex()can fail for small tables; this should be treated as a skip, not a hard failure.Suggested guard
async reindex(vector_bucket, vector_index) { dbg.log0("reindex vector_bucket =", vector_bucket.name.unwrap(), ", vector_index =", vector_index.name.unwrap()); const table_name = vector_bucket.name.unwrap() + "_" + vector_index.name.unwrap(); const table = await this.get_table(table_name); - await table.createIndex('vector', {replace: true}); + if (!table) { + dbg.log1("reindex skipped, table not found =", table_name); + return; + } + try { + await table.createIndex('vector', { replace: true }); + } catch (err) { + if (err.message?.includes('Not enough rows')) { + dbg.log1("reindex skipped, not enough rows =", table_name); + return; + } + throw err; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 282 - 287, The reindex method should treat non-indexable or tiny tables as skippable: check the result of get_table(table_name) and if it's undefined, log a debug/info message and return without throwing; also wrap table.createIndex('vector', {replace: true}) in a try/catch that catches errors from non-indexable/small tables, logs a debug/info-level skip message (including table_name and the error) and does not propagate the exception. Update the async reindex(vector_bucket, vector_index) function to perform these guards around get_table and createIndex.
364-366:⚠️ Potential issue | 🟡 MinorGuard unresolved namespace resource before dereference.
If lookup fails, current code throws a generic TypeError instead of a clear configuration error.
Suggested guard
const nsr = typeof vector_bucket.namespace_resource.resource === 'object' ? vector_bucket.namespace_resource.resource : system_store.data.systems[0].namespace_resources_by_name[vector_bucket.namespace_resource.resource]; + if (!nsr?.nsfs_config?.fs_root_path) { + throw new Error(`Missing namespace_resource/nsfs_config for vector bucket ${vector_bucket.name}`); + } lance_path = nsr.nsfs_config.fs_root_path;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/util/vectors_util.js` around lines 364 - 366, The code dereferences nsr.nsfs_config.fs_root_path without verifying nsr exists; update the lookup around vector_bucket.namespace_resource/resource to validate that nsr is not null/undefined (after computing nsr from system_store.data.systems[0].namespace_resources_by_name or the inline object) and if it's missing throw or return a clear configuration error mentioning the missing namespace resource and the offending vector_bucket.namespace_resource.resource value; reference the symbols vector_bucket, namespace_resource, nsr, system_store, namespace_resources_by_name, nsfs_config, and fs_root_path when adding the guard and error to make the failure explicit.src/server/system_services/bucket_server.js (1)
2560-2565:⚠️ Potential issue | 🟠 MajorPrevent lost updates in
SETpath forrows_since_index.
SETcan overwrite concurrentADDupdates, so row changes can be dropped and reindex timing drifts.Suggested fix (CAS-style update)
async function update_rows_since_index(req) { @@ - if (req.rpc_params.op === 'SET') { - change.update.vector_indices[0].rows_since_index = req.rpc_params.value; - } else { + if (req.rpc_params.op === 'SET') { + if (typeof req.rpc_params.expected !== 'number') { + throw new RpcError('BAD_REQUEST', 'expected rows_since_index is required for SET'); + } + change.update.vector_indices[0].$find = { + _id: vector_index._id, + rows_since_index: req.rpc_params.expected, + }; + change.update.vector_indices[0].$set = { + rows_since_index: req.rpc_params.value, + }; + } else if (req.rpc_params.op === 'ADD') { change.update.vector_indices[0].$inc = { rows_since_index: req.rpc_params.value }; + } else { + throw new RpcError('BAD_REQUEST', `Unsupported op ${req.rpc_params.op}`); }As per coding guidelines: "
**/*.{js,ts,tsx}: Review code for race conditions in async code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/system_services/bucket_server.js` around lines 2560 - 2565, The current SET branch directly assigns change.update.vector_indices[0].rows_since_index = req.rpc_params.value which can overwrite concurrent ADD ($inc) updates; replace this with a CAS-style update: when req.rpc_params.op === 'SET' read the current rows_since_index for that vector index, then issue an atomic update that includes the current value in the update filter (e.g. update ... where rows_since_index == current) to $set it to req.rpc_params.value and retry the read/update if the conditional update fails; alternatively, require/use an expected value from req.rpc_params (e.g. req.rpc_params.expected) and include it in the update filter so change.update.vector_indices[0].rows_since_index is only set if the stored value equals expected, falling back to a retry loop if needed. Ensure you modify the code that now sets change.update.vector_indices[0].rows_since_index and the surrounding update execution logic to perform the conditional update instead of a blind assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/server/system_services/bucket_server.js`:
- Around line 2357-2360: Add a new VectorError variant named NoSuchVectorIndex
in the vector error definitions (VectorError.NoSuchVectorIndex) and export it
from the vector_errors module, then update the RPC_ERRORS_TO_VECTOR mapping in
the vector_rest module to map the RPC error string "NO_SUCH_VECTOR_INDEX" to
VectorError.NoSuchVectorIndex so REST endpoints convert the
RpcError('NO_SUCH_VECTOR_INDEX', ...) into the correct HTTP/vector error instead
of falling back to InternalFailure.
---
Duplicate comments:
In `@src/server/system_services/bucket_server.js`:
- Around line 2560-2565: The current SET branch directly assigns
change.update.vector_indices[0].rows_since_index = req.rpc_params.value which
can overwrite concurrent ADD ($inc) updates; replace this with a CAS-style
update: when req.rpc_params.op === 'SET' read the current rows_since_index for
that vector index, then issue an atomic update that includes the current value
in the update filter (e.g. update ... where rows_since_index == current) to $set
it to req.rpc_params.value and retry the read/update if the conditional update
fails; alternatively, require/use an expected value from req.rpc_params (e.g.
req.rpc_params.expected) and include it in the update filter so
change.update.vector_indices[0].rows_since_index is only set if the stored value
equals expected, falling back to a retry loop if needed. Ensure you modify the
code that now sets change.update.vector_indices[0].rows_since_index and the
surrounding update execution logic to perform the conditional update instead of
a blind assignment.
In `@src/util/vectors_util.js`:
- Around line 282-287: The reindex method should treat non-indexable or tiny
tables as skippable: check the result of get_table(table_name) and if it's
undefined, log a debug/info message and return without throwing; also wrap
table.createIndex('vector', {replace: true}) in a try/catch that catches errors
from non-indexable/small tables, logs a debug/info-level skip message (including
table_name and the error) and does not propagate the exception. Update the async
reindex(vector_bucket, vector_index) function to perform these guards around
get_table and createIndex.
- Around line 364-366: The code dereferences nsr.nsfs_config.fs_root_path
without verifying nsr exists; update the lookup around
vector_bucket.namespace_resource/resource to validate that nsr is not
null/undefined (after computing nsr from
system_store.data.systems[0].namespace_resources_by_name or the inline object)
and if it's missing throw or return a clear configuration error mentioning the
missing namespace resource and the offending
vector_bucket.namespace_resource.resource value; reference the symbols
vector_bucket, namespace_resource, nsr, system_store,
namespace_resources_by_name, nsfs_config, and fs_root_path when adding the guard
and error to make the failure explicit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8e69b9c5-d7d3-4f41-a71e-5f050f5c5e2d
📒 Files selected for processing (13)
config.jssrc/api/bucket_api.jssrc/endpoint/endpoint.jssrc/sdk/bucketspace_fs.jssrc/sdk/bucketspace_nb.jssrc/sdk/nb.d.tssrc/sdk/vector_sdk.jssrc/server/bg_services/db_cleaner.jssrc/server/bg_services/vector_buckets_reindexer.jssrc/server/bg_workers.jssrc/server/system_services/bucket_server.jssrc/server/system_services/schemas/vector_index_schema.jssrc/util/vectors_util.js
✅ Files skipped from review due to trivial changes (3)
- src/server/system_services/schemas/vector_index_schema.js
- src/server/bg_workers.js
- src/api/bucket_api.js
🚧 Files skipped from review as they are similar to previous changes (5)
- src/sdk/bucketspace_fs.js
- src/sdk/bucketspace_nb.js
- src/sdk/vector_sdk.js
- src/server/bg_services/vector_buckets_reindexer.js
- src/server/bg_services/db_cleaner.js
Describe the Problem
Background maintenance for vectors for containerized.
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
New Features
Bug Fixes / Maintenance