Optimised RAG#434
Conversation
* feat(lora): add LoRA adapter support across SDK + demo app
Implement LoRA (Low-Rank Adaptation) adapter hot-swapping for llama.cpp
backend across all 6 SDK layers (C++ -> C API -> Component -> JNI ->
Kotlin Bridge -> Kotlin Public API).
- Add load/remove/clear/query LoRA adapter operations
- Use vtable dispatch in component layer to decouple librac_commons
from librac_backend_llamacpp (fixes linker errors)
- Add LoRA vtable entries to rac_llm_service_ops_t
- Fix AttachCurrentThread cast for Android NDK C++ JNI build
- Add RunAnyWhereLora Android demo app with Material 3 Q&A UI
- Add comprehensive implementation docs with C/C++ API reference
* feat(ci): add selectable build targets to Build All workflow + fix Swift concurrency errors
Rewrite build-all-test.yml with 9 boolean checkbox inputs so each build
target can be toggled independently from the GitHub Actions UI:
- C++ Android Backends (arm64-v8a, armeabi-v7a, x86_64 matrix)
- C++ iOS Backends (XCFramework)
- Kotlin SDK (JVM + Android)
- Swift SDK (iOS/macOS)
- Web SDK (TypeScript)
- Flutter SDK (Dart analyze via Melos)
- React Native SDK (TypeScript via Lerna)
- Android Example Apps (RunAnywhereAI + RunAnyWhereLora)
- IntelliJ Plugin
Fix two Swift strict-concurrency errors that fail the Swift SDK build:
- LiveTranscriptionSession: add @unchecked Sendable (safe because class
is @mainactor, all access serialized)
- RunAnywhere+VisionLanguage: add Sendable conformance to rac_vlm_image_t
so the C struct can cross the Task boundary in the streaming builder;
simplify StreamingCollector to start timing at init
* fix(swift): resolve strict concurrency errors in LiveTranscriptionSession and VLM streaming
LiveTranscriptionSession.swift:
- Replace [weak self] captures with strong `let session = self` before
closures to avoid captured var in @Sendable/@task contexts (class is
@mainactor @unchecked Sendable so strong ref is safe, bounded by
stream lifecycle)
- Wrap deprecated startStreamingTranscription call in @available helper
to silence deprecation warning until migration to transcribeStream API
RunAnywhere+VisionLanguage.swift:
- Add `let capturedCImage = cImage` before AsyncThrowingStream closure
so the Task captures an immutable let instead of a mutable var
- Add `extension rac_vlm_image_t: @unchecked Sendable {}` for the C
struct to cross Task concurrency boundaries safely
- Simplify StreamingCollector to initialize startTime at init instead
of requiring a separate async start() call
* fix(jni): address CodeRabbit review findings in LoRA JNI functions
- Replace raw -1 returns with RAC_ERROR_INVALID_HANDLE/RAC_ERROR_INVALID_ARGUMENT
to match codebase error handling conventions
- Use getCString() helper instead of raw GetStringUTFChars/ReleaseStringUTFChars
- Add missing result logging to racLlmComponentRemoveLora and racLlmComponentClearLora
- Use rac_free() instead of free() in racLlmComponentGetLoraInfo for consistency
- Clarify LoRA adapter memory ownership comments (adapters freed automatically
with model per llama.cpp b8011 API — llama_adapter_lora_free is deprecated)
* ios initial changes * minimal sample needed to test lora * updating docs * addressed the comments
First version for Optimised RAG. Not polished yet, Once tested, I'll microoptimise, bench, and finish.
Optimised RAG Prototype
…s for consistency. Update comments and logging messages to reflect the change from "backend" to "pipeline". Remove unused React Native package files related to RAG.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai please do a detailed review. Thanks! |
|
@greptile please do a detailed review. Thanks! |
|
✅ Actions performedReview triggered.
|
|
https://github.com/RunanywhereAI/runanywhere-sdks/blob/main/Package.swift#L267-L379 We need to fix this as wel - need to clean up |
| rac_handle_t llm_handle = nullptr; | ||
|
|
||
| try { | ||
| // Create embeddings service via registry |
There was a problem hiding this comment.
embed_handle not declared
| // Create embeddings service via registry | |
| rac_handle_t embed_handle = nullptr; | |
| rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cpp
Line: 124
Comment:
`embed_handle` not declared
```suggestion
rac_handle_t embed_handle = nullptr;
rac_result_t result = rac_embeddings_create(config->embedding_model_path, &embed_handle);
```
How can I resolve this? If you propose a fix, please make it concise.| url: "https://github.com/RunanywhereAI/runanywhere-sdks/releases/download/v\(sdkVersion)/RABackendRAG-v\(sdkVersion).zip", | ||
| checksum: "0000000000000000000000000000000000000000000000000000000000000000" // Replace with actual checksum | ||
| ) | ||
| targets.append( |
There was a problem hiding this comment.
this also needs to be removed
…dding memory+speed output
|
This doesn't seems to address the comment I added, so not sure if this was ever tested or addressed. @VyasGuru |
|
Too many files changed for review. ( |
|
Generated with ❤️ by ellipsis.dev |
df98f9a
into
RunanywhereAI:shubham-rag-fix
|
Merged this branch manually into |
|
If anything is remainng please lmk asap |
OPTIMISED RAG FOR FASTER CHUNKING, BATCH PROCESSING.
implemented hybrid search(testing out so i can tweak config)
Greptile Summary
This PR implements major optimizations to the RAG system with hybrid search capabilities. The rewrite introduces BM25 sparse keyword search alongside dense vector search, using Reciprocal Rank Fusion (RRF) to merge results. Document chunking has been completely rewritten with a recursive algorithm using hierarchical separators for better boundary detection. Batch embedding processing improves throughput significantly.
Key improvements:
Critical issue found:
rac_rag_pipeline_create_standalone: undeclared variableembed_handle(line 124) will prevent the code from buildingConfidence Score: 2/5
embed_handlevariable inrac_rag_pipeline.cppline 124 is a critical syntax error that will cause compilation failure. While the architectural changes are sound and the hybrid search implementation is well-designed, the code cannot be built or tested in its current state.sdk/runanywhere-commons/src/features/rag/rac_rag_pipeline.cppwhich contains a critical compilation errorImportant Files Changed
embed_handlevariable prevents buildFlowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[User Query] --> B[RAG Backend] B --> C[Embed Query Text] C --> D[Parallel Search] D --> E[Dense Vector Search<br/>USearch with i8 quantization] D --> F[BM25 Keyword Search<br/>Inverted index] E --> G[Dense Results<br/>Cosine similarity] F --> H[BM25 Results<br/>TF-IDF scoring] G --> I[Reciprocal Rank Fusion<br/>k=60] H --> I I --> J[Top-K Fused Results] J --> K[Build Context<br/>Token budget: 2048] K --> L[Format Prompt<br/>with context] L --> M[LLM Service] M --> N[Generated Answer] style B fill:#e1f5ff style I fill:#fff4e1 style D fill:#f0f0f0Last reviewed commit: 388aca0