Optimised RAG by VyasGuru · Pull Request #434 · RunanywhereAI/runanywhere-sdks

VyasGuru · 2026-02-27T20:54:12Z

OPTIMISED RAG FOR FASTER CHUNKING, BATCH PROCESSING.
implemented hybrid search(testing out so i can tweak config)

Greptile Summary

This PR implements major optimizations to the RAG system with hybrid search capabilities. The rewrite introduces BM25 sparse keyword search alongside dense vector search, using Reciprocal Rank Fusion (RRF) to merge results. Document chunking has been completely rewritten with a recursive algorithm using hierarchical separators for better boundary detection. Batch embedding processing improves throughput significantly.

Key improvements:

Hybrid search with BM25 + dense embeddings for better retrieval of exact keywords, acronyms, and rare terms
Recursive chunking with configurable separators improves document splitting quality
Batch embedding operations reduce overhead for multi-document ingestion
i8 quantization in vector store reduces memory footprint
SIMD tokenizer optimizations for ARM NEON architectures

Critical issue found:

Compilation error in rac_rag_pipeline_create_standalone: undeclared variable embed_handle (line 124) will prevent the code from building

Confidence Score: 2/5

Cannot merge due to compilation error that prevents building
The undeclared embed_handle variable in rac_rag_pipeline.cpp line 124 is a critical syntax error that will cause compilation failure. While the architectural changes are sound and the hybrid search implementation is well-designed, the code cannot be built or tested in its current state.
Pay close attention to sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp which contains a critical compilation error

Important Files Changed

Filename	Overview
sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp	Critical compilation error: undeclared `embed_handle` variable prevents build
sdk/runanywhere-commons/src/features/rag/bm25_index.cpp	Solid BM25 implementation with standard parameters (k1=1.2, b=0.75), thread-safe operations
sdk/runanywhere-commons/src/features/rag/rag_chunker.cpp	Recursive chunking implementation with hierarchical separators improves document splitting
sdk/runanywhere-commons/src/features/rag/rag_backend.cpp	Major refactor: hybrid search with RRF fusion, batch embedding support, improved architecture
sdk/runanywhere-commons/src/features/rag/vector_store_usearch.cpp	Optimized vector store with i8 quantization and batch operations for memory efficiency
sdk/runanywhere-commons/src/features/rag/onnx_embedding_provider.cpp	Tokenizer with SIMD optimizations for ARM NEON, LRU cache for token lookups

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Query] --> B[RAG Backend]
    B --> C[Embed Query Text]
    C --> D[Parallel Search]
    D --> E[Dense Vector Search<br/>USearch with i8 quantization]
    D --> F[BM25 Keyword Search<br/>Inverted index]
    E --> G[Dense Results<br/>Cosine similarity]
    F --> H[BM25 Results<br/>TF-IDF scoring]
    G --> I[Reciprocal Rank Fusion<br/>k=60]
    H --> I
    I --> J[Top-K Fused Results]
    J --> K[Build Context<br/>Token budget: 2048]
    K --> L[Format Prompt<br/>with context]
    L --> M[LLM Service]
    M --> N[Generated Answer]
    
    style B fill:#e1f5ff
    style I fill:#fff4e1
    style D fill:#f0f0f0

_{Last reviewed commit: 388aca0}

@unchecked

* feat(lora): add LoRA adapter support across SDK + demo app Implement LoRA (Low-Rank Adaptation) adapter hot-swapping for llama.cpp backend across all 6 SDK layers (C++ -> C API -> Component -> JNI -> Kotlin Bridge -> Kotlin Public API). - Add load/remove/clear/query LoRA adapter operations - Use vtable dispatch in component layer to decouple librac_commons from librac_backend_llamacpp (fixes linker errors) - Add LoRA vtable entries to rac_llm_service_ops_t - Fix AttachCurrentThread cast for Android NDK C++ JNI build - Add RunAnyWhereLora Android demo app with Material 3 Q&A UI - Add comprehensive implementation docs with C/C++ API reference * feat(ci): add selectable build targets to Build All workflow + fix Swift concurrency errors Rewrite build-all-test.yml with 9 boolean checkbox inputs so each build target can be toggled independently from the GitHub Actions UI: - C++ Android Backends (arm64-v8a, armeabi-v7a, x86_64 matrix) - C++ iOS Backends (XCFramework) - Kotlin SDK (JVM + Android) - Swift SDK (iOS/macOS) - Web SDK (TypeScript) - Flutter SDK (Dart analyze via Melos) - React Native SDK (TypeScript via Lerna) - Android Example Apps (RunAnywhereAI + RunAnyWhereLora) - IntelliJ Plugin Fix two Swift strict-concurrency errors that fail the Swift SDK build: - LiveTranscriptionSession: add @unchecked Sendable (safe because class is @mainactor, all access serialized) - RunAnywhere+VisionLanguage: add Sendable conformance to rac_vlm_image_t so the C struct can cross the Task boundary in the streaming builder; simplify StreamingCollector to start timing at init * fix(swift): resolve strict concurrency errors in LiveTranscriptionSession and VLM streaming LiveTranscriptionSession.swift: - Replace [weak self] captures with strong `let session = self` before closures to avoid captured var in @Sendable/@task contexts (class is @mainactor @unchecked Sendable so strong ref is safe, bounded by stream lifecycle) - Wrap deprecated startStreamingTranscription call in @available helper to silence deprecation warning until migration to transcribeStream API RunAnywhere+VisionLanguage.swift: - Add `let capturedCImage = cImage` before AsyncThrowingStream closure so the Task captures an immutable let instead of a mutable var - Add `extension rac_vlm_image_t: @unchecked Sendable {}` for the C struct to cross Task concurrency boundaries safely - Simplify StreamingCollector to initialize startTime at init instead of requiring a separate async start() call * fix(jni): address CodeRabbit review findings in LoRA JNI functions - Replace raw -1 returns with RAC_ERROR_INVALID_HANDLE/RAC_ERROR_INVALID_ARGUMENT to match codebase error handling conventions - Use getCString() helper instead of raw GetStringUTFChars/ReleaseStringUTFChars - Add missing result logging to racLlmComponentRemoveLora and racLlmComponentClearLora - Use rac_free() instead of free() in racLlmComponentGetLoraInfo for consistency - Clarify LoRA adapter memory ownership comments (adapters freed automatically with model per llama.cpp b8011 API — llama_adapter_lora_free is deprecated)

* ios initial changes * minimal sample needed to test lora * updating docs * addressed the comments

First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.

Optimised RAG Prototype

…s for consistency. Update comments and logging messages to reflect the change from "backend" to "pipeline". Remove unused React Native package files related to RAG.

coderabbitai · 2026-02-27T20:54:22Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

shubhammalhotra28 · 2026-02-27T21:01:37Z

@coderabbitai please do a detailed review. Thanks!

shubhammalhotra28 · 2026-02-27T21:01:42Z

@greptile please do a detailed review. Thanks!

coderabbitai · 2026-02-27T21:01:58Z

@shubhammalhotra28: I'll conduct a detailed review of the PR focusing on the RAG optimizations, chunking improvements, batch processing implementation, and the hybrid search feature. Let me analyze the changes.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

shubhammalhotra28 · 2026-02-27T21:05:52Z

https://github.com/RunanywhereAI/runanywhere-sdks/blob/main/Package.swift#L267-L379

We need to fix this as wel - need to clean up
@VyasGuru

greptile-apps

_{153 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-27T21:06:32Z

sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp

+    rac_handle_t llm_handle = nullptr;
+
+    try {
+        // Create embeddings service via registry


embed_handle not declared

Suggested change

// Create embeddings service via registry

rac_handle_t embed_handle = nullptr;

rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle);

Prompt To Fix With AI

This is a comment left during a code review. Path: sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp Line: 124 Comment: `embed_handle` not declared ```suggestion rac_handle_t embed_handle = nullptr; rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle); ``` How can I resolve this? If you propose a fix, please make it concise.

shubhammalhotra28 · 2026-02-27T22:13:49Z

Package.swift

-                    url: "https://github.com/RunanywhereAI/runanywhere-sdks/releases/download/v\(sdkVersion)/RABackendRAG-v\(sdkVersion).zip",
-                    checksum: "0000000000000000000000000000000000000000000000000000000000000000" // Replace with actual checksum
-                )
+        targets.append(


this also needs to be removed

…dding memory+speed output

shubhammalhotra28 · 2026-03-01T21:42:01Z

This doesn't seems to address the comment I added, so not sure if this was ever tested or addressed. @VyasGuru

greptile-apps · 2026-03-01T21:42:07Z

Too many files changed for review. (153 files found, 100 file limit)

ellipsis-dev · 2026-03-01T21:42:21Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

shubhammalhotra28 · 2026-03-01T21:51:26Z

Merged this branch manually into shubham-rag-fix and resolved all merge conflicts. The changes are now in commit df98f9a0. Closing this PR — thanks @VyasGuru for the optimised RAG work.

shubhammalhotra28 · 2026-03-01T21:51:53Z

If anything is remainng please lmk asap

Siddhesh2377 and others added 10 commits February 21, 2026 15:06

Add lora ios (RunanywhereAI#407)

2e45ec0

* ios initial changes * minimal sample needed to test lora * updating docs * addressed the comments

Merge branch 'main' into dev

abda61a

Prototype for Optimised RAG

4cb8532

First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.

Merge branch 'RunanywhereAI:main' into RAG-OPTIS

aa7236c

Merge branch 'main' into dev

bc33fef

Merge pull request RunanywhereAI#428 from VyasGuru/RAG-OPTIS

9e4f2df

Optimised RAG Prototype

RAG rewrite

90fc4ce

Refactor RAG terminology to "pipeline" across scripts and source file…

9340646

…s for consistency. Update comments and logging messages to reflect the change from "backend" to "pipeline". Remove unused React Native package files related to RAG.

Optimised RAG + implement a hybrid search

d2527d0

fixed tnc block error.

388aca0

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

shubhammalhotra28 reviewed Feb 27, 2026

View reviewed changes

Changed batching parametres, similarity threshold, and optimised embe…

053bc27

…dding memory+speed output

shubhammalhotra28 changed the base branch from dev to shubham-rag-fix March 1, 2026 21:29

shubhammalhotra28 mentioned this pull request Mar 1, 2026

Complete RAG Flutter implementation (full state) #419

Merged

shubhammalhotra28 marked this pull request as ready for review March 1, 2026 21:42

shubhammalhotra28 merged commit df98f9a into RunanywhereAI:shubham-rag-fix Mar 1, 2026
4 of 6 checks passed

	// Create embeddings service via registry
	rac_handle_t embed_handle = nullptr;
	rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle);

Conversation

VyasGuru commented Feb 27, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

shubhammalhotra28 commented Feb 27, 2026

Uh oh!

shubhammalhotra28 commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Uh oh!

shubhammalhotra28 commented Feb 27, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

shubhammalhotra28 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

shubhammalhotra28 commented Mar 1, 2026

Uh oh!

greptile-apps bot commented Mar 1, 2026

Uh oh!

ellipsis-dev bot commented Mar 1, 2026

Uh oh!

Uh oh!

shubhammalhotra28 commented Mar 1, 2026

Uh oh!

shubhammalhotra28 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VyasGuru commented Feb 27, 2026 •

edited by greptile-apps bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading