Skip to content

Optimised RAG#434

Merged
shubhammalhotra28 merged 12 commits intoRunanywhereAI:shubham-rag-fixfrom
VyasGuru:smonga/rag_refact
Mar 1, 2026
Merged

Optimised RAG#434
shubhammalhotra28 merged 12 commits intoRunanywhereAI:shubham-rag-fixfrom
VyasGuru:smonga/rag_refact

Conversation

@VyasGuru
Copy link
Collaborator

@VyasGuru VyasGuru commented Feb 27, 2026

OPTIMISED RAG FOR FASTER CHUNKING, BATCH PROCESSING.
implemented hybrid search(testing out so i can tweak config)

Greptile Summary

This PR implements major optimizations to the RAG system with hybrid search capabilities. The rewrite introduces BM25 sparse keyword search alongside dense vector search, using Reciprocal Rank Fusion (RRF) to merge results. Document chunking has been completely rewritten with a recursive algorithm using hierarchical separators for better boundary detection. Batch embedding processing improves throughput significantly.

Key improvements:

  • Hybrid search with BM25 + dense embeddings for better retrieval of exact keywords, acronyms, and rare terms
  • Recursive chunking with configurable separators improves document splitting quality
  • Batch embedding operations reduce overhead for multi-document ingestion
  • i8 quantization in vector store reduces memory footprint
  • SIMD tokenizer optimizations for ARM NEON architectures

Critical issue found:

  • Compilation error in rac_rag_pipeline_create_standalone: undeclared variable embed_handle (line 124) will prevent the code from building

Confidence Score: 2/5

  • Cannot merge due to compilation error that prevents building
  • The undeclared embed_handle variable in rac_rag_pipeline.cpp line 124 is a critical syntax error that will cause compilation failure. While the architectural changes are sound and the hybrid search implementation is well-designed, the code cannot be built or tested in its current state.
  • Pay close attention to sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp which contains a critical compilation error

Important Files Changed

Filename Overview
sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp Critical compilation error: undeclared embed_handle variable prevents build
sdk/runanywhere-commons/src/features/rag/bm25_index.cpp Solid BM25 implementation with standard parameters (k1=1.2, b=0.75), thread-safe operations
sdk/runanywhere-commons/src/features/rag/rag_chunker.cpp Recursive chunking implementation with hierarchical separators improves document splitting
sdk/runanywhere-commons/src/features/rag/rag_backend.cpp Major refactor: hybrid search with RRF fusion, batch embedding support, improved architecture
sdk/runanywhere-commons/src/features/rag/vector_store_usearch.cpp Optimized vector store with i8 quantization and batch operations for memory efficiency
sdk/runanywhere-commons/src/features/rag/onnx_embedding_provider.cpp Tokenizer with SIMD optimizations for ARM NEON, LRU cache for token lookups

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User Query] --> B[RAG Backend]
    B --> C[Embed Query Text]
    C --> D[Parallel Search]
    D --> E[Dense Vector Search<br/>USearch with i8 quantization]
    D --> F[BM25 Keyword Search<br/>Inverted index]
    E --> G[Dense Results<br/>Cosine similarity]
    F --> H[BM25 Results<br/>TF-IDF scoring]
    G --> I[Reciprocal Rank Fusion<br/>k=60]
    H --> I
    I --> J[Top-K Fused Results]
    J --> K[Build Context<br/>Token budget: 2048]
    K --> L[Format Prompt<br/>with context]
    L --> M[LLM Service]
    M --> N[Generated Answer]
    
    style B fill:#e1f5ff
    style I fill:#fff4e1
    style D fill:#f0f0f0
Loading

Last reviewed commit: 388aca0

Siddhesh2377 and others added 10 commits February 21, 2026 15:06
* feat(lora): add LoRA adapter support across SDK + demo app

  Implement LoRA (Low-Rank Adaptation) adapter hot-swapping for llama.cpp
  backend across all 6 SDK layers (C++ -> C API -> Component -> JNI ->
  Kotlin Bridge -> Kotlin Public API).

  - Add load/remove/clear/query LoRA adapter operations
  - Use vtable dispatch in component layer to decouple librac_commons
    from librac_backend_llamacpp (fixes linker errors)
  - Add LoRA vtable entries to rac_llm_service_ops_t
  - Fix AttachCurrentThread cast for Android NDK C++ JNI build
  - Add RunAnyWhereLora Android demo app with Material 3 Q&A UI
  - Add comprehensive implementation docs with C/C++ API reference

* feat(ci): add selectable build targets to Build All workflow + fix Swift concurrency errors

  Rewrite build-all-test.yml with 9 boolean checkbox inputs so each build
  target can be toggled independently from the GitHub Actions UI:
  - C++ Android Backends (arm64-v8a, armeabi-v7a, x86_64 matrix)
  - C++ iOS Backends (XCFramework)
  - Kotlin SDK (JVM + Android)
  - Swift SDK (iOS/macOS)
  - Web SDK (TypeScript)
  - Flutter SDK (Dart analyze via Melos)
  - React Native SDK (TypeScript via Lerna)
  - Android Example Apps (RunAnywhereAI + RunAnyWhereLora)
  - IntelliJ Plugin

  Fix two Swift strict-concurrency errors that fail the Swift SDK build:
  - LiveTranscriptionSession: add @unchecked Sendable (safe because class
    is @mainactor, all access serialized)
  - RunAnywhere+VisionLanguage: add Sendable conformance to rac_vlm_image_t
    so the C struct can cross the Task boundary in the streaming builder;
    simplify StreamingCollector to start timing at init

* fix(swift): resolve strict concurrency errors in LiveTranscriptionSession and VLM streaming

  LiveTranscriptionSession.swift:
  - Replace [weak self] captures with strong `let session = self` before
    closures to avoid captured var in @Sendable/@task contexts (class is
    @mainactor @unchecked Sendable so strong ref is safe, bounded by
    stream lifecycle)
  - Wrap deprecated startStreamingTranscription call in @available helper
    to silence deprecation warning until migration to transcribeStream API

  RunAnywhere+VisionLanguage.swift:
  - Add `let capturedCImage = cImage` before AsyncThrowingStream closure
    so the Task captures an immutable let instead of a mutable var
  - Add `extension rac_vlm_image_t: @unchecked Sendable {}` for the C
    struct to cross Task concurrency boundaries safely
  - Simplify StreamingCollector to initialize startTime at init instead
    of requiring a separate async start() call

* fix(jni): address CodeRabbit review findings in LoRA JNI functions

  - Replace raw -1 returns with RAC_ERROR_INVALID_HANDLE/RAC_ERROR_INVALID_ARGUMENT
    to match codebase error handling conventions
  - Use getCString() helper instead of raw GetStringUTFChars/ReleaseStringUTFChars
  - Add missing result logging to racLlmComponentRemoveLora and racLlmComponentClearLora
  - Use rac_free() instead of free() in racLlmComponentGetLoraInfo for consistency
  - Clarify LoRA adapter memory ownership comments (adapters freed automatically
    with model per llama.cpp b8011 API — llama_adapter_lora_free is deprecated)
* ios initial changes

* minimal sample needed to test lora

* updating docs

* addressed the comments
First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.
…s for consistency. Update comments and logging messages to reflect the change from "backend" to "pipeline". Remove unused React Native package files related to RAG.
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shubhammalhotra28
Copy link
Contributor

@coderabbitai please do a detailed review. Thanks!

@shubhammalhotra28
Copy link
Contributor

@greptile please do a detailed review. Thanks!

@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

@shubhammalhotra28: I'll conduct a detailed review of the PR focusing on the RAG optimizations, chunking improvements, batch processing implementation, and the hybrid search feature. Let me analyze the changes.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@shubhammalhotra28
Copy link
Contributor

https://github.com/RunanywhereAI/runanywhere-sdks/blob/main/Package.swift#L267-L379

We need to fix this as wel - need to clean up
@VyasGuru

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

153 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

rac_handle_t llm_handle = nullptr;

try {
// Create embeddings service via registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embed_handle not declared

Suggested change
// Create embeddings service via registry
rac_handle_t embed_handle = nullptr;
rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle);
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp
Line: 124

Comment:
`embed_handle` not declared

```suggestion
        rac_handle_t embed_handle = nullptr;
        rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle);
```

How can I resolve this? If you propose a fix, please make it concise.

url: "https://github.com/RunanywhereAI/runanywhere-sdks/releases/download/v\(sdkVersion)/RABackendRAG-v\(sdkVersion).zip",
checksum: "0000000000000000000000000000000000000000000000000000000000000000" // Replace with actual checksum
)
targets.append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also needs to be removed

@shubhammalhotra28 shubhammalhotra28 changed the base branch from dev to shubham-rag-fix March 1, 2026 21:29
@shubhammalhotra28
Copy link
Contributor

This doesn't seems to address the comment I added, so not sure if this was ever tested or addressed. @VyasGuru

@shubhammalhotra28 shubhammalhotra28 marked this pull request as ready for review March 1, 2026 21:42
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 1, 2026

Too many files changed for review. (153 files found, 100 file limit)

@ellipsis-dev
Copy link

ellipsis-dev bot commented Mar 1, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@shubhammalhotra28 shubhammalhotra28 merged commit df98f9a into RunanywhereAI:shubham-rag-fix Mar 1, 2026
4 of 6 checks passed
@shubhammalhotra28
Copy link
Contributor

Merged this branch manually into shubham-rag-fix and resolved all merge conflicts. The changes are now in commit df98f9a0. Closing this PR — thanks @VyasGuru for the optimised RAG work.

@shubhammalhotra28
Copy link
Contributor

If anything is remainng please lmk asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants