feat(search): Implement Reciprocal Rank Fusion (RRF) for hybrid search#896
feat(search): Implement Reciprocal Rank Fusion (RRF) for hybrid search#896Milofax wants to merge 26 commits intocoleam00:mainfrom
Conversation
- Implement Ollama integration for LLM and embeddings - Add OllamaConfigurationPanel for UI configuration - Add model discovery service for automatic model detection - Update RAG settings to support Ollama embeddings - Add documentation (INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md) - Add E2E tests for Ollama functionality 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
…if block 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
- Add test for Ollama with auth token - Add test for Ollama without auth token (local setup) - Update fallback test to expect 'required-but-ignored' placeholder - Add RAG settings mock to multi-provider sequence test 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Security fixes: - Remove hardcoded JWT token from INFRASTRUCTURE.md (replace with placeholder) - Remove real Ollama auth tokens from PLAN.md (replace with placeholders) - Fix secrets leak: Change RAG settings logging from info to debug level, log only keys Functional fixes: - Fix URL normalization for auth-token mapping (both incoming and configured URLs now use same normalization) Best practices: - Add test-results/ to .gitignore - Remove test-results from version control All 22 Python tests pass, server healthy. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
PR coleam00#875: Ollama Support with Local Model Discovery - Security fixes: JWT tokens, auth tokens, RAG settings logging - Bug fixes: URL normalization, None-handling - Best practices: test-results gitignore All 7 items marked as ERLEDIGT (2025-11-21) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit d7b5ea6.
- Add auth_token parameter to validate_provider_instance() function - Load auth tokens from RAG settings in /validate endpoint - Match incoming URLs against configured LLM_BASE_URL and OLLAMA_EMBEDDING_URL - Apply correct auth token (chat or embedding) based on URL matching - Add tests for auth token flow through all layers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
When "Use same host for embedding instance" is checked, now also copies useAuth and authToken from LLM config to embedding config, not just the URL and name. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Extract syncEmbeddingFromLLM() into testable utility function - Add 6 tests covering: full sync, default name, custom default, auth settings copy, empty token handling, immutability - Refactor RAGSettings to use the utility function 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Merges OpenRouter embedding provider feature from upstream. Resolves conflict in llm_provider_service.py by keeping auth_token parameter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Addresses CodeRabbit review: CHAT_MODEL was undefined because MODEL_CHOICE is the actual key used. Now uses helper functions that correctly resolve the displayed model name. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add useAuth/authToken to instance config resets (RAGSettings.tsx) - Clear auth tokens from ragSettings when deleting instances - Fix type hint: dict[str, any] -> dict[str, Any] (llm_provider_service.py) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- ArchonChatPanel: Remove BookOpen, Search - BugReportModal: Remove ExternalLink - ErrorBoundaryWithBugReport: Remove React (use named imports) - CodeViewerModal: Remove FileText - DisconnectScreenOverlay: Remove Wifi, WifiOff - APIKeysSection: Remove Key, Input 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add pymupdf4llm>=0.2.5 as primary PDF extraction library - Produces clean Markdown with proper word separation - Falls back to pdfplumber, then PyPDF2 if needed - Validated: pymupdf4llm extracts "Since 1996, Bloomberg" vs pdfplumber's "Since1996,Bloomberg" (no spaces) Phase0 validated, Phase1 complete. Phase2 (OCR) pending. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…de block protection - Add heading boundary (\n#) as highest priority breakpoint - Protect code blocks from being split mid-block - Add is_inside_code_block() helper to check if position is within ```...``` - Only break at paragraphs/sentences when outside code blocks This improves RAG quality for Markdown documents (especially from pymupdf4llm) by preserving semantic structure during chunking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…dels BREAKING: Search now supports all embedding dimensions (384, 768, 1024, 1536, 3072) Previously search was hardcoded to use 1536-dim functions, which meant: - snowflake-arctic-embed2 (1024-dim) → NO RESULTS - nomic-embed-text (768-dim) → NO RESULTS - Only OpenAI text-embedding-3-small (1536-dim) worked Fix: - Detect dimension from query_embedding: len(query_embedding) - Use _multi RPC functions with embedding_dimension parameter - base_search_strategy.py: match_archon_*_multi - hybrid_search_strategy.py: hybrid_search_*_multi 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds automatic OCR extraction for scanned/image-based PDFs that don't contain extractable text layers. New files: - ocr_processing.py: Tesseract OCR engine wrapper - extract_text_with_ocr(): Convert PDF pages to images, run OCR - is_ocr_available(): Check for pytesseract/pdf2image - get_supported_languages(): List installed Tesseract languages Changes: - document_processing.py: OCR fallback when text extraction fails - pyproject.toml: Add pytesseract>=0.3.10, pdf2image>=1.17.0 Flow: pymupdf4llm → pdfplumber → PyPDF2 → Tesseract OCR Requires system dependencies: - macOS: brew install tesseract poppler - Linux: apt install tesseract-ocr poppler-utils 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace direct score comparison with RRF scoring for proper fusion of vector and text search results. Previous implementation incorrectly compared cosine similarity (0-1) with ts_rank_cd scores (different scale). Changes: - SQL: Use ROW_NUMBER() + RRF formula (1/(rank+60)) instead of raw scores - SQL: Add embedding_model_filter parameter for future model switching - SQL: Fetch 3x candidates for better fusion quality before limiting - Test: Update RPC name assertion to _multi version RRF is scale-independent and works across domains without tuning (k=60). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-instance Ollama authentication (frontend + backend), RRF-based multi-dimensional hybrid search and migrations, OCR-backed PDF extraction, parallelized embedding sub-batch processing, Playwright E2E tests and infra docs, plus various UI/type/test/compose/.gitignore cleanups. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Frontend as RAGSettings/UI
participant API as Backend API
participant Creds as Credentials Service
participant Ollama as Ollama Instance
User->>Frontend: configure instance URL + enable auth + enter token
Frontend->>API: save rag_settings (includes per-instance token)
API->>Creds: read rag_strategy / per-URL tokens
Creds-->>API: return token map
rect rgba(230,244,234,0.5)
API->>Ollama: GET /api/tags or /health (Authorization: Bearer {token})
Ollama-->>API: models / health
API-->>Frontend: report health & models
end
rect rgba(240,245,255,0.5)
Frontend->>API: inference request (chat or embedding)
API->>Creds: resolve operation-specific token
Creds-->>API: return token
API->>Ollama: POST /api/chat or /api/embeddings (Authorization: Bearer {token})
Ollama-->>API: inference response
API-->>Frontend: inference result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
archon-ui-main/src/components/settings/APIKeysSection.tsx (3)
96-96: Implicitanytype violates TypeScript strict mode.Line 96 declares
value: any, which violates the "no implicit any" requirement per coding guidelines. Replace with a specific union type matchingCustomCredentialfield values.- const updateCredential = (index: number, field: keyof CustomCredential, value: any) => { + const updateCredential = (index: number, field: keyof CustomCredential, value: string | boolean) => {Note: If other field types are needed (e.g.,
undefined), adjust the union accordingly.
267-271: Avoid dynamic Tailwind class construction in frontend components.Lines 267–271 construct Tailwind classes conditionally using ternary operators. Per coding guidelines, use a classnames utility or extract static classes to avoid tooling issues.
- className={`w-full px-3 py-2 pr-20 rounded-md border text-sm ${ - cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]' - ? 'bg-gray-100 dark:bg-gray-800 border-gray-200 dark:border-gray-600 text-gray-500 dark:text-gray-400' - : 'bg-white dark:bg-gray-900 border-gray-300 dark:border-gray-700' - }`} + className={cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]' + ? 'w-full px-3 py-2 pr-20 rounded-md border text-sm bg-gray-100 dark:bg-gray-800 border-gray-200 dark:border-gray-600 text-gray-500 dark:text-gray-400' + : 'w-full px-3 py-2 pr-20 rounded-md border text-sm bg-white dark:bg-gray-900 border-gray-300 dark:border-gray-700' + }Alternatively, extract the condition into a helper variable for clarity.
305-312: Avoid dynamic Tailwind class construction in encryption toggle button.Lines 305–312 construct Tailwind classes dynamically via ternary operators. Extract static classes or use a classnames utility to comply with the coding guideline against dynamic Tailwind construction.
- className={` - absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors - ${cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]' - ? 'cursor-not-allowed opacity-50 text-pink-400' - : cred.is_encrypted - ? 'text-pink-600 dark:text-pink-400 hover:bg-pink-100 dark:hover:bg-pink-900/20' - : 'text-gray-400 hover:bg-gray-200 dark:hover:bg-gray-700' - } - `} + className={ + cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]' + ? 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors cursor-not-allowed opacity-50 text-pink-400' + : cred.is_encrypted + ? 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors text-pink-600 dark:text-pink-400 hover:bg-pink-100 dark:hover:bg-pink-900/20' + : 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors text-gray-400 hover:bg-gray-200 dark:hover:bg-gray-700' + }Or extract the condition into a helper constant for better maintainability.
python/src/server/api_routes/ollama_api.py (1)
247-256: Bug: Incorrect timestamp value - uses module name instead of actual timestamp.Line 255 sets
timestamptomodel_discovery_service.check_instance_health.__module__, which returns the module name string (e.g.,"src.server.services.ollama.model_discovery_service"), not a timestamp.Apply this diff to fix the timestamp:
return { "summary": { "total_instances": len(instance_urls), "healthy_instances": healthy_count, "unhealthy_instances": len(instance_urls) - healthy_count, "average_response_time_ms": avg_response_time }, "instance_status": health_results, - "timestamp": model_discovery_service.check_instance_health.__module__ # Use current timestamp + "timestamp": datetime.now().isoformat() }You'll also need to import
datetimeat the top of the file if not already imported for this endpoint.python/src/server/services/ollama/model_discovery_service.py (1)
975-983: Cache key doesn't account for auth token - may return stale results for different auth contexts.The health check cache uses only the
instance_urlas the cache key (line 976). If the same URL is accessed with different auth tokens (or with/without tokens), the cached result from one context may incorrectly be returned for another.Consider including a hash or presence indicator of the auth token in the cache key:
- cache_key = f"health_{instance_url}" + # Include token presence in cache key to avoid cross-context cache hits + token_indicator = "auth" if auth_token else "noauth" + cache_key = f"health_{instance_url}_{token_indicator}"archon-ui-main/src/components/settings/RAGSettings.tsx (1)
142-174: Extend interface to include auth token fields.The
RAGSettingsPropsinterface definesCHAT_MODELbut omits the auth token fields (OLLAMA_CHAT_AUTH_TOKEN,OLLAMA_EMBEDDING_AUTH_TOKEN,OLLAMA_API_MODE) that are accessed elsewhere viaas anycasts. Adding them would provide type safety. As per coding guidelines, TypeScript strict mode with no implicit any is expected.OLLAMA_EMBEDDING_INSTANCE_NAME?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; + OLLAMA_API_MODE?: 'native' | 'openai-compatible'; // Crawling Performance Settingspython/src/server/utils/document_processing.py (1)
297-319: Inconsistent fallback threshold between pdfplumber and PyPDF2.The pdfplumber fallback (line 289) requires
>100characters to consider extraction successful, but PyPDF2 (line 312) accepts any non-empty result. This means PyPDF2 could return just a few characters instead of falling through to OCR.Consider applying the same threshold for consistency:
if text_content: combined_text = "\n\n".join(text_content) + if len(combined_text.strip()) <= 100: + logfire.warning("PyPDF2 returned insufficient text, trying OCR") + else: - logger.info(f"PDF extracted with PyPDF2: {len(combined_text)} chars") - return _preserve_code_blocks_across_pages(combined_text) + logger.info(f"PDF extracted with PyPDF2: {len(combined_text)} chars") + return _preserve_code_blocks_across_pages(combined_text) # If no text, fall through to OCR
🧹 Nitpick comments (27)
python/src/agent_work_orders/utils/state_reconciliation.py (2)
184-200: Preferlogger.exception()in except blocksBoth exception handlers use
logger.error(..., exc_info=True). To align with the logging guidelines and make intent clearer, consider switching tologger.exception(...)and dropping the explicitexc_infoargument, e.g.:- except Exception as e: - actions.append(f"Failed to delete {orphan_path}: {e}") - logger.error("orphaned_worktree_delete_failed", path=orphan_path, error=str(e), exc_info=True) + except Exception as e: + actions.append(f"Failed to delete {orphan_path}: {e}") + logger.exception("orphaned_worktree_delete_failed", path=orphan_path, error=str(e)) ... - except Exception as e: - actions.append(f"Failed to update {work_order_id}: {e}") - logger.error("dangling_state_update_failed", work_order_id=work_order_id, error=str(e), exc_info=True) + except Exception as e: + actions.append(f"Failed to update {work_order_id}: {e}") + logger.exception("dangling_state_update_failed", work_order_id=work_order_id, error=str(e))As per coding guidelines,
logger.exception()is the preferred pattern inexceptblocks.
106-125: Tightenreconcile_statereturn type instead of usingAny
reconcile_statecurrently returnsdict[str, Any], which conflicts with the “noAnywithout explicit justification” rule foragent_work_orders. Consider defining a TypedDict or dataclass for the report shape (e.g., fields fororphaned_worktrees,dangling_state,fix_applied,actions_taken) and using that as the return type so mypy can fully validate callers. This can be a follow-up to this formatting fix.As per coding guidelines, strict typing without
Anyis expected in this package.QUICKSTART.md (1)
23-28: Minor: Consider wrapping URLs in angle brackets for markdown lint compliance.Static analysis flagged bare URLs (MD034). While functional, wrapping URLs improves markdown portability.
| Service | URL | Beschreibung | |---------|-----|-------------| -| **Archon UI** | http://localhost:3737 | Hauptanwendung | -| **Supabase Studio** | http://localhost:54323 | Datenbank-UI | -| **API Server** | http://localhost:8181 | Backend API | -| **MCP Server** | http://localhost:8051/mcp | Model Context Protocol | +| **Archon UI** | <http://localhost:3737> | Hauptanwendung | +| **Supabase Studio** | <http://localhost:54323> | Datenbank-UI | +| **API Server** | <http://localhost:8181> | Backend API | +| **MCP Server** | <http://localhost:8051/mcp> | Model Context Protocol |INFRASTRUCTURE.md (2)
78-88: Replace absolute paths with generic examples.The documentation contains absolute paths specific to a developer's local machine (e.g.,
/Volumes/DATEN/Coding/archon). These should be replaced with generic placeholders or relative paths to make the documentation useful for all contributors.Consider using placeholders like:
cd <YOUR_PROJECT_ROOT>/supabase supabase startOr establish a convention like
$ARCHON_ROOTas an environment variable throughout the documentation.
1-544: Consider adding an English version or translation.This comprehensive infrastructure documentation is written entirely in German, which may limit accessibility for international contributors. While maintaining the German version for your team, consider adding an English version or at least a brief English summary at the top linking to translated versions.
This would improve the project's accessibility without removing the valuable German documentation.
archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)
1-90: Comprehensive test coverage for sync utility.The test suite provides excellent coverage of the
syncEmbeddingFromLLMfunction, including:
- Full property copying
- Default name fallback behavior
- Custom default names
- Auth settings edge cases
- Immutability verification
Consider adding one additional test case for defensive programming:
it('should handle null or undefined llmConfig gracefully', () => { // If the function should handle invalid inputs expect(() => syncEmbeddingFromLLM(null as any)).toThrow(); });However, this is optional if the function is only called with validated inputs.
archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx (1)
734-754: Clean authentication UI implementation.The Authentication Settings section is well-structured:
- Checkbox for toggling authentication
- Password input conditionally rendered only when authentication is enabled
- Accessible label with proper cursor pointer
- Good use of conditional rendering
Minor enhancement suggestion: Consider adding a visual indicator or help text explaining what the auth token is for, especially for users new to Ollama authentication.
archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (2)
19-20: Consider using data-testid attributes for more stable selectors.The tests rely heavily on text-based selectors like
button:has-text("Ollama")andtext=RAG Settings. While functional, these are fragile if UI text changes (e.g., for i18n or copy updates).Consider adding
data-testidattributes to key UI elements:// In the component: <button data-testid="ollama-card">Ollama</button> // In the test: await page.locator('[data-testid="ollama-card"]').click();This provides a stable contract between tests and components that won't break due to text changes.
Also applies to: 24-25, 49-50, 54-55, 78-79, 83-84, 123-124, 128-129, 156-157, 174-175, 178-179
5-5: Externalize the base URL to test configuration.The URL
http://localhost:3737is hardcoded, which makes it difficult to run tests in different environments.Use Playwright's base URL configuration:
// In playwright.config.ts: export default { use: { baseURL: process.env.BASE_URL || 'http://localhost:3737', }, }; // In the test: await page.goto('/settings');python/tests/test_ollama_auth_token.py (1)
58-64: Consider simplifying the auth_token assertion.The assertion on lines 63-64 checks for the auth_token in both kwargs and positional args, which seems overly complex.
Since the function signature uses
auth_tokenas a keyword argument (from the relevant code snippet), you can simplify this to:# Verify auth_token was passed to check_instance_health mock_model_discovery_service.check_instance_health.assert_called_once_with( "http://localhost:11434", auth_token="test-auth-token" )This is more readable and explicitly validates the expected call signature.
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
18-28: Clean and well-implemented utility function.The
syncEmbeddingFromLLMfunction is straightforward and correctly implements the config synchronization:
- Returns a new object (immutability)
- Sensible default parameter for name fallback
- Clear documentation
- All properties properly copied
The comprehensive test coverage in
__tests__/instanceConfigSync.test.tsvalidates the behavior.Optional: Consider adding JSDoc
@throwsif you decide to add validation in the future, or document that the function assumes valid inputs.python/src/server/api_routes/ollama_api.py (3)
98-131: Consider extracting duplicated auth token resolution logic into a helper function.The auth token retrieval and URL normalization pattern is repeated in three places:
discover_models_endpoint(lines 99-131),health_check_endpoint(lines 181-216), andvalidate_instance_endpoint(lines 277-299). This duplication increases maintenance burden and risk of inconsistencies.Consider extracting this logic:
async def _get_ollama_auth_tokens_map(instance_urls: list[str]) -> dict[str, str]: """Get auth token mapping for Ollama instance URLs.""" from ..services.credential_service import credential_service rag_settings = await credential_service.get_credentials_by_category("rag_strategy") llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/") embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/") chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or "" embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or "" auth_tokens_map = {} for url in instance_urls: normalized_url = url.replace("/v1", "").rstrip("/") if normalized_url == llm_base_url and chat_auth_token: auth_tokens_map[normalized_url] = chat_auth_token elif normalized_url == embedding_base_url and embedding_auth_token: auth_tokens_map[normalized_url] = embedding_auth_token return auth_tokens_map
549-549: Deprecated Pydantic method:.dict()is deprecated in Pydantic v2.Line 549 uses
model.dict()which is deprecated in Pydantic v2. Usemodel.model_dump()instead.- models_data = { - "models": [model.dict() for model in stored_models], + models_data = { + "models": [model.model_dump() for model in stored_models],
1169-1169: Deprecated Pydantic method:.dict()is deprecated in Pydantic v2.Line 1169 uses
stored_model.dict()which is deprecated in Pydantic v2.- model_dict = stored_model.dict() + model_dict = stored_model.model_dump()python/src/server/services/llm_provider_service.py (1)
415-440: Consider extracting duplicated Ollama auth token resolution logic.The auth token retrieval logic from rag_strategy settings is duplicated between the OpenAI fallback path (lines 415-440) and the direct Ollama path (lines 455-479). Both blocks:
- Check/populate the rag_settings cache
- Select token based on
use_embedding_providerorinstance_type- Default to "required-but-ignored" if no token
Consider extracting to a helper function:
async def _get_ollama_auth_token(use_embedding: bool = False) -> str: """Get Ollama auth token from RAG settings, defaulting to placeholder.""" cache_key = "rag_strategy_settings" rag_settings = _get_cached_settings(cache_key) if rag_settings is None: rag_settings = await credential_service.get_credentials_by_category("rag_strategy") _set_cached_settings(cache_key, rag_settings) token_key = "OLLAMA_EMBEDDING_AUTH_TOKEN" if use_embedding else "OLLAMA_CHAT_AUTH_TOKEN" token = rag_settings.get(token_key, "") if not token: logger.warning(f"No Ollama auth token found for {token_key}, using placeholder") return "required-but-ignored" return tokenAlso applies to: 455-479
PLAN.md (2)
1-3: Consider whether this planning document should be committed to the repository.This appears to be an internal implementation plan with deployment-specific details (local paths, placeholder tokens). If this is meant for end-users or contributors, consider:
- Moving deployment-specific paths and tokens to environment examples
- Translating to English for broader accessibility
If this is a temporary development artifact, consider adding it to
.gitignoreor removing it before merge.
124-124: Convert bare URL to a proper Markdown link.Static analysis flagged this as a bare URL. For better documentation formatting:
-1. **UI öffnen**: http://localhost:3737 +1. **UI öffnen**: [http://localhost:3737](http://localhost:3737)archon-ui-main/src/components/settings/RAGSettings.tsx (3)
228-249: Replaceas anytype casts with properly typed interface.The
(ragSettings as any).OLLAMA_CHAT_AUTH_TOKENpattern bypasses TypeScript's type checking. This is used in multiple locations (lines 231, 249, 254, 272) and should be addressed by extending theRAGSettingsPropsinterface as suggested above.Additionally, the comparison on line 243 checks
newConfig.authToken !== prev.authTokenbut doesn't compareuseAuthstate, which could lead to missed updates if the checkbox changes without the token changing.- if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.authToken !== prev.authToken) { + if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.useAuth !== prev.useAuth || newConfig.authToken !== prev.authToken) {
641-679: Consider typing the model response instead of usingany.The model mapping on lines 660 and 667 uses
(m: any) => m.name. Consider defining a type for the model response to improve type safety:interface OllamaModel { name: string; instance_url?: string; // ... other fields }Then use it:
(m: OllamaModel) => m.nameThe error handling is good - silently failing and leaving the model list empty is appropriate for non-critical functionality.
681-692: AddloadAvailableModelsto useEffect dependencies or wrap withuseCallback.The
loadAvailableModelsfunction is called in these effects but not included in the dependency arrays. This triggers thereact-hooks/exhaustive-depsESLint rule (which appears suppressed elsewhere). While currently safe becauseloadAvailableModelsdoesn't capture any changing values, this pattern is fragile.Additionally, the comparison against the hardcoded default URL is duplicated. Consider extracting to a constant:
+const isDefaultOllamaUrl = (url: string) => url === DEFAULT_OLLAMA_URL; + useEffect(() => { - if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') { + if (llmInstanceConfig.url && !isDefaultOllamaUrl(llmInstanceConfig.url)) { loadAvailableModels(llmInstanceConfig.url, 'chat'); } -}, [llmInstanceConfig.url]); +}, [llmInstanceConfig.url, loadAvailableModels]);python/src/server/utils/ocr_processing.py (3)
14-15: Remove unused imports.
tempfileandosare imported but never used in this module. Ruff would flag these as unused imports.import io -import tempfile -import os from typing import Optional
122-124: Error logging should preserve the full stack trace.Per coding guidelines, error logs should include
exc_info=Trueto preserve full stack traces. The current f-string only captures the exception message, losing the traceback.except Exception as e: - logfire.error(f"OCR extraction failed: {e}") + logfire.error(f"OCR extraction failed: {e}", exc_info=True) return None
79-89: Consider adding safeguards for large PDFs.
convert_from_bytesloads all pages as images into memory simultaneously. At 300 DPI, each page could be 10-50MB. For very large scanned documents, this could cause memory exhaustion.Consider adding a page limit or processing pages in batches:
MAX_OCR_PAGES = 50 # Configurable limit images = convert_from_bytes(file_content, dpi=dpi) if len(images) > MAX_OCR_PAGES: logger.warning(f"PDF has {len(images)} pages, processing first {MAX_OCR_PAGES} for OCR") images = images[:MAX_OCR_PAGES]Alternatively, use
convert_from_byteswithfirst_pageandlast_pageparameters to process in chunks.python/tests/test_rag_simple.py (1)
94-98: Strengthen RPC assertion to also validate payloadThe name assertion is good, but you can cheaply guard the new multi-endpoint wiring by also checking the payload, e.g.
embedding_dimension:- # Verify RPC was called correctly (uses _multi version for dimension support) - mock_supabase.rpc.assert_called_once() - call_args = mock_supabase.rpc.call_args[0] - assert call_args[0] == "match_archon_crawled_pages_multi" + # Verify RPC was called correctly (uses _multi version for dimension support) + mock_supabase.rpc.assert_called_once() + fn_name, params = mock_supabase.rpc.call_args[0] + assert fn_name == "match_archon_crawled_pages_multi" + assert params["embedding_dimension"] == len(query_embedding)python/src/server/services/search/base_search_strategy.py (1)
50-76: Multi-RPC wiring looks correct; consider pre-validation and richer error loggingThe shift to computing
embedding_dimension = len(query_embedding)and routing tof"{table_rpc}_multi"withembedding_dimensioninrpc_paramsis consistent with the newmatch_archon_*_multiSQL signatures and should work across dimensions.Two small follow-ups you might consider:
- Guard obvious bad inputs before going to the DB (e.g., empty
query_embeddingor clearly unsupported dimensions) and log a clear reason instead of relying on the SQLRAISE EXCEPTIONpath.- In the
exceptblock below, switch tologger.exception("Vector search failed", extra={...})orlogger.error(..., exc_info=True)so you preserve stack traces, in line with the backend logging guidelines.python/src/server/services/search/hybrid_search_strategy.py (1)
57-62: Hybrid_multiRPC integration is consistent with the SQL contractsDeriving
embedding_dimension = len(query_embedding)and routing tohybrid_search_archon_crawled_pages_multi/hybrid_search_archon_code_examples_multiwithquery_embedding,embedding_dimension,query_text,match_count,filter, andsource_filtermatches the new function signatures and RRF expectations. The added debug logs with dimensionality are also helpful.If you want to reduce duplication, you could extract a tiny helper to compute
embedding_dimensionand log the “X D embedding” message shared between document and code hybrid searches, but that’s purely cosmetic.Also applies to: 149-159
migration/0.1.0/013_hybrid_search_rrf.sql (1)
17-129: RRF-based hybrid functions look correct and parameter-safeThe two new
hybrid_search_archon_*_multifunctions are well-structured:
embedding_dimension→embedding_columnmapping viaCASEmatches the definedembedding_384…embedding_3072columns and raises cleanly on unsupported dimensions.- Dynamic SQL uses
%Ifor column identifiers and positional parameters for all data ($1..$8), avoiding SQL injection while correctly wiringquery_embedding,candidate_count,match_count,filter,source_filter,query_text,embedding_model_filter, andrrf_k.- RRF scoring
1.0/(rank + rrf_k)summed over vector/text ranks withrrf_k := 60is implemented correctly and matches the comments;match_typelabels (hybrid/vector/keyword) will be useful to the application layer.- Limiting to
candidate_count := LEAST(match_count * 3, 100)is a reasonable trade-off between fusion quality and performance.Main suggestion is process-level rather than code-level: ensure this definition stays in sync with the corresponding section in
complete_setup.sqlso new installs and incremental migrations don’t drift over time.Also applies to: 132-242
| baseUrl: string; | ||
| isEnabled: boolean; | ||
| isPrimary: boolean; | ||
| authToken?: string; |
There was a problem hiding this comment.
The authToken field is added to OllamaInstance but not persisted or loaded.
The authToken field is added to the interface (line 68) but the getOllamaInstances and setOllamaInstances methods don't handle this field. When instances are saved, the auth token won't be persisted, and when loaded, it won't be restored.
Add authToken handling in both methods:
In setOllamaInstances (around line 507):
const fields: Record<string, any> = {
name: instance.name,
baseUrl: instance.baseUrl,
isEnabled: instance.isEnabled,
isPrimary: instance.isPrimary,
instanceType: instance.instanceType || 'both',
loadBalancingWeight: instance.loadBalancingWeight || 100
};
+
+ if (instance.authToken) {
+ fields.authToken = instance.authToken;
+ }In getOllamaInstances, ensure the field parsing handles authToken (it may already work since strings pass through, but verify the field name mapping).
Also applies to: 506-528
🤖 Prompt for AI Agents
In archon-ui-main/src/services/credentialsService.ts around line 68 and lines
~506-528, the new authToken field on OllamaInstance is not persisted or
restored: update setOllamaInstances to include authToken when serializing
instances to storage (ensure undefined/null are handled and avoid dropping the
field), and update getOllamaInstances to parse/assign authToken back onto each
OllamaInstance when reading from storage (respect typing and provide a default
undefined if missing); ensure the stored shape and runtime mapping use the exact
field name "authToken" so tokens are saved and reloaded correctly.
| # Helper: check if position is inside a code block | ||
| def is_inside_code_block(text_segment: str, pos: int) -> bool: | ||
| return text_segment[:pos].count("```") % 2 == 1 | ||
|
|
||
| # If no code block, try paragraph break | ||
| elif "\n\n" in chunk: | ||
| # Store original end for checking if we found a break point | ||
| original_end = end | ||
|
|
||
| # First, try to break at a heading boundary (# ## ### etc.) | ||
| heading_pos = chunk.rfind("\n#") | ||
| if heading_pos != -1 and heading_pos > chunk_size * 0.3: | ||
| if not is_inside_code_block(chunk, heading_pos): | ||
| end = start + heading_pos | ||
|
|
||
| # If no heading, try paragraph break (but not inside code blocks) | ||
| if end == original_end and "\n\n" in chunk: | ||
| last_break = chunk.rfind("\n\n") | ||
| if last_break > chunk_size * 0.3: | ||
| if last_break > chunk_size * 0.3 and not is_inside_code_block(chunk, last_break): | ||
| end = start + last_break | ||
|
|
||
| # If no paragraph break, try sentence break | ||
| elif ". " in chunk: | ||
| # If no paragraph break, try sentence break (but not inside code blocks) | ||
| if end == original_end and ". " in chunk: | ||
| last_period = chunk.rfind(". ") | ||
| if last_period > chunk_size * 0.3: | ||
| if last_period > chunk_size * 0.3 and not is_inside_code_block(chunk, last_period): | ||
| end = start + last_period + 1 |
There was a problem hiding this comment.
Code‑block detection can be wrong across chunk boundaries
is_inside_code_block only inspects the current chunk, so if a fenced code block starts before start and continues into this chunk, positions are misclassified as “outside code,” and the heading/paragraph/sentence logic can still split inside a code block (e.g., on \n# in a Python comment within a large fenced block). This violates the intent of preserving code blocks when possible.
I’d strongly recommend basing the check on the full text and the absolute position (start + local_pos) so fences that began in earlier chunks are honored.
One way to fix this in place:
- # Helper: check if position is inside a code block
- def is_inside_code_block(text_segment: str, pos: int) -> bool:
- return text_segment[:pos].count("```") % 2 == 1
+ # Helper: check if position is inside a code block in the full text
+ def is_inside_code_block(pos: int) -> bool:
+ # Use absolute position so we respect code fences that started before this chunk.
+ return text[:pos].count("```") % 2 == 1
@@
- heading_pos = chunk.rfind("\n#")
- if heading_pos != -1 and heading_pos > chunk_size * 0.3:
- if not is_inside_code_block(chunk, heading_pos):
- end = start + heading_pos
+ heading_pos = chunk.rfind("\n#")
+ if heading_pos != -1 and heading_pos > chunk_size * 0.3:
+ if not is_inside_code_block(start + heading_pos):
+ end = start + heading_pos
@@
- if end == original_end and "\n\n" in chunk:
- last_break = chunk.rfind("\n\n")
- if last_break > chunk_size * 0.3 and not is_inside_code_block(chunk, last_break):
- end = start + last_break
+ if end == original_end and "\n\n" in chunk:
+ last_break = chunk.rfind("\n\n")
+ if last_break > chunk_size * 0.3 and not is_inside_code_block(start + last_break):
+ end = start + last_break
@@
- if end == original_end and ". " in chunk:
- last_period = chunk.rfind(". ")
- if last_period > chunk_size * 0.3 and not is_inside_code_block(chunk, last_period):
- end = start + last_period + 1
+ if end == original_end and ". " in chunk:
+ last_period = chunk.rfind(". ")
+ if last_period > chunk_size * 0.3 and not is_inside_code_block(start + last_period):
+ end = start + last_period + 1If performance on very large texts becomes an issue, you could later optimize by precomputing code‑fence positions once and using them to answer is_inside_code_block in O(log n) or O(1) rather than rescanning text each time.
🤖 Prompt for AI Agents
python/src/server/services/storage/base_storage_service.py around lines 78 to
101: the current is_inside_code_block helper inspects only the chunk and
misclassifies positions when a fenced code block starts before the chunk;
replace it with a helper that checks the full text using absolute positions
(i.e., is_inside_code_block(pos) -> evaluate text[:pos].count("```") % 2 == 1)
and update every call to pass start + local_pos (e.g., heading_pos, last_break,
last_period) so heading/paragraph/sentence break checks correctly respect
fences; keep existing break selection logic unchanged.
…ings Phase 2: Reranking Model Selection - Add RERANKING_MODEL dropdown to RAGSettings UI - Options: MiniLM-L6 (fast), MiniLM-L12 (better), BGE Base, BGE Large - Dropdown only visible when reranking is enabled - Backend already supported model selection via settings Phase 3: Parallel Contextual Embeddings - Replace sequential sub-batch processing with asyncio.gather - Use Semaphore to limit concurrent API calls to MAX_WORKERS setting - The MAX_WORKERS slider in UI now actually controls parallelization - Expected 2-5x speedup for contextual embedding generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Label: "Max" → "Workers" (clearer) - Added recommendation: 3-5 for cloud APIs, up to 10 for local Ollama - Explains that higher values may hit rate limits 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/src/server/services/storage/document_storage_service.py (1)
147-155:credential_serviceis used before it's imported - causesNameError.The
credential_serviceis first referenced at line 148, but it's not imported until line 347 (from ..credential_service import credential_service). This will cause aNameErrorat runtime whenUSE_CONTEXTUAL_EMBEDDINGSlookup is attempted.Move the import to the top of the function or add a defensive import here:
+ # Import credential_service for settings lookup + from ..credential_service import credential_service + try: use_contextual_embeddings = await credential_service.get_credential( "USE_CONTEXTUAL_EMBEDDINGS", "false", decrypt=True )
♻️ Duplicate comments (1)
archon-ui-main/src/services/credentialsService.ts (1)
69-69: TheauthTokenfield is still not persisted insetOllamaInstances.The
authTokenfield was added to theOllamaInstanceinterface, but thesetOllamaInstancesmethod at lines 507-515 does not includeauthTokenin thefieldsobject. When instances are saved, the auth token will be lost.Apply this diff to persist the authToken:
const fields: Record<string, any> = { name: instance.name, baseUrl: instance.baseUrl, isEnabled: instance.isEnabled, isPrimary: instance.isPrimary, instanceType: instance.instanceType || 'both', loadBalancingWeight: instance.loadBalancingWeight || 100 }; + + if (instance.authToken) { + fields.authToken = instance.authToken; + } // Add optional health-related fields
🧹 Nitpick comments (4)
python/src/server/services/storage/document_storage_service.py (3)
23-23: Type hint should beint | Noneto match the default value.The parameter accepts
Noneas a default but the type hint only specifiesint. This could cause MyPy errors.- batch_size: int = None, # Will load from settings + batch_size: int | None = None, # Will load from settings
271-273: Remove unused variablechunk_idx.The variable
chunk_idxis initialized but never used in the subsequent loop. This appears to be leftover from a previous implementation.# Flatten results and track successes contextual_contents = [] successful_count = 0 - chunk_idx = 0
533-536: Consider usingdebuglevel for DEBUG-prefixed log messages.These log statements are prefixed with "DEBUG" but use
search_logger.info(). If these are for troubleshooting, consider usingsearch_logger.debug()to avoid noise in production logs, or remove the DEBUG prefix if they should remain as info.- search_logger.info( + search_logger.debug( f"DEBUG document_storage sending final 100% | total_batches={total_batches} | " f"chunks_stored={total_chunks_stored} | contents_len={len(contents)}" )- search_logger.info("DEBUG document_storage final 100% sent successfully") + search_logger.debug("DEBUG document_storage final 100% sent successfully")archon-ui-main/src/components/settings/RAGSettings.tsx (1)
642-693: Consider memoizingloadAvailableModelswithuseCallback.The
loadAvailableModelsfunction is defined inline and called from multipleuseEffecthooks but isn't memoized. While the current implementation works because the function only depends on its parameters, wrapping it inuseCallbackwould be more idiomatic and prevent potential issues if the function's dependencies change in the future.- const loadAvailableModels = async (url: string, context: 'chat' | 'embedding') => { + const loadAvailableModels = useCallback(async (url: string, context: 'chat' | 'embedding') => { try { // ... implementation } catch (error: any) { console.warn(`Failed to load models from ${url}: ${error.message}`); } - }; + }, []);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
archon-ui-main/src/components/settings/RAGSettings.tsx(21 hunks)archon-ui-main/src/services/credentialsService.ts(4 hunks)python/src/server/services/storage/document_storage_service.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
python/src/server/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data
Files:
python/src/server/services/storage/document_storage_service.py
python/src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration
python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety
Files:
python/src/server/services/storage/document_storage_service.py
python/src/server/services/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations
Files:
python/src/server/services/storage/document_storage_service.py
python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations
Files:
python/src/server/services/storage/document_storage_service.py
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/services/credentialsService.tsarchon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (2)
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to python/src/server/services/**/*.py : For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Applied to files:
python/src/server/services/storage/document_storage_service.py
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (2)
python/src/server/services/storage/document_storage_service.py (2)
python/tests/progress_tracking/integration/test_document_storage_progress.py (1)
cancellation_check(198-202)python/src/server/services/embeddings/contextual_embedding_service.py (1)
generate_contextual_embeddings_batch(166-272)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (12)
python/src/server/services/storage/document_storage_service.py (2)
240-294: Parallel sub-batch processing implementation looks solid.The semaphore-based concurrency limiting, graceful degradation on sub-batch failures, and proper result flattening are well-implemented. The use of
return_exceptions=Trueensures all sub-batches complete even when some fail.
373-389: Good use ofdefaultdict(deque)for handling duplicate text mapping.The positions-by-text structure correctly handles the case where the same text appears multiple times, ensuring each embedding is mapped back to the correct original index. Using
popleft()maintains order consistency.archon-ui-main/src/services/credentialsService.ts (3)
19-27: LGTM - Interface additions are appropriate.The new optional fields (
RERANKING_MODEL,CHAT_MODEL,OLLAMA_API_MODE) align with the per-instance authentication and API mode configuration features being added across the PR.
203-209: LGTM - Default values are sensible.The defaults for
CHAT_MODEL(empty string) andOLLAMA_API_MODE("native") align with the expected behavior where users must explicitly select a chat model and native Ollama API is the standard mode.
231-247: LGTM - String field mappings correctly updated.The new fields
CHAT_MODELandOLLAMA_API_MODEare properly added to the string fields array for correct type coercion during credential mapping.archon-ui-main/src/components/settings/RAGSettings.tsx (7)
211-227: LGTM - State initialization for authentication is well-structured.The instance config objects properly include
useAuthandauthTokenfields, and the refs correctly track authToken for change detection to prevent infinite update loops.
1603-1660: LGTM - API Mode selection UI is well-implemented.The radio-button style selection for Native vs OpenAI-Compatible API modes is clear, with appropriate visual feedback and helpful descriptions explaining the endpoint differences. The default fallback to 'native' is consistent with the service defaults.
1853-1880: LGTM - Authentication status display is clear and informative.The summary table correctly shows whether authentication is configured for each instance type, with appropriate visual indicators (lock icon for configured tokens).
2393-2416: LGTM - Authentication settings UI in modals is well-implemented.The conditional token input that appears only when "Use Authentication" is checked provides good UX. The password input type appropriately masks the token.
788-811: LGTM - Delete handlers properly reset authentication state.The LLM instance deletion correctly resets both local state (
useAuth,authToken) and the ragSettings auth token field.
2091-2129: LGTM - Reranking model selection provides good options.The conditional display of the model selector when reranking is enabled is clean UX. The model options (MiniLM variants and BGE rerankers) are appropriate choices for cross-encoder reranking with clear performance trade-off descriptions.
2376-2391: LGTM - Sync function properly propagates authentication settings.The
syncEmbeddingFromLLMfunction correctly copies all relevant fields including authentication settings when the user chooses to use the same host for embedding.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
142-177: Add missing authentication and API mode fields to interface.The interface is still missing
OLLAMA_API_MODE,OLLAMA_CHAT_AUTH_TOKEN, andOLLAMA_EMBEDDING_AUTH_TOKEN. These fields are accessed via type casts(ragSettings as any)throughout the file (lines 232, 250, 255, 273, 803, 828, 1611, 2435, 2525), bypassing TypeScript's type safety.As per coding guidelines, use TypeScript in strict mode with no implicit any. Add the missing fields:
interface RAGSettingsProps { ragSettings: { MODEL_CHOICE: string; CHAT_MODEL?: string; USE_CONTEXTUAL_EMBEDDINGS: boolean; CONTEXTUAL_EMBEDDINGS_MAX_WORKERS: number; USE_HYBRID_SEARCH: boolean; USE_AGENTIC_RAG: boolean; USE_RERANKING: boolean; RERANKING_MODEL?: string; LLM_PROVIDER?: string; LLM_BASE_URL?: string; LLM_INSTANCE_NAME?: string; + OLLAMA_API_MODE?: string; + OLLAMA_CHAT_AUTH_TOKEN?: string; + OLLAMA_EMBEDDING_AUTH_TOKEN?: string; EMBEDDING_MODEL?: string; EMBEDDING_PROVIDER?: string; OLLAMA_EMBEDDING_URL?: string; OLLAMA_EMBEDDING_INSTANCE_NAME?: string; // ... rest of fields }; setRagSettings: (settings: any) => void; }Then remove the
(ragSettings as any)casts and access the fields with proper typing.
🧹 Nitpick comments (2)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
2094-2133: Reranking model selection implementation is solid.The reranking UI is well-implemented with a good selection of models covering different performance/quality tradeoffs. The default model choice (MiniLM-L6) is appropriate for most use cases.
Optional: Consider enhancing the help text to provide more concrete guidance on model selection:
<p className="text-xs text-gray-500 dark:text-gray-400 mt-1"> - Larger models = better results but slower + MiniLM-L6: ~50ms, MiniLM-L12: ~80ms, BGE-Large: ~150ms (approximate per-document latency) </p>
2049-2049: Consider a more descriptive label for the workers input.The label "Workers" is functional but could be more self-explanatory for users unfamiliar with the concept.
Apply this diff to improve clarity:
<label className="text-xs text-gray-500 dark:text-gray-400 mt-1"> - Workers + API Workers </label>Alternatively, consider "Concurrent Requests" or "Parallel Workers" to better convey the parallelization concept.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
archon-ui-main/src/components/settings/RAGSettings.tsx(22 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (1)
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (5)
archon-ui-main/src/components/settings/RAGSettings.tsx (5)
753-756: LGTM: Model loading after health check.The addition of
loadAvailableModelsafter successful health checks is well-implemented. It ensures models are only fetched when the instance is confirmed healthy, and the conditional check oncontextprevents unnecessary loads.
1603-1660: LGTM: API mode selection UI.The Ollama API mode selection UI is well-designed with clear visual feedback and helpful descriptions. The default handling for undefined or 'native' mode is appropriate.
Note: Ensure
OLLAMA_API_MODEis added to theRAGSettingsPropsinterface as flagged in the previous comment to maintain type safety.
1853-1880: LGTM: Authentication status display.The authentication status display in the configuration summary is clear and informative, with appropriate visual indicators (lock icon) and context-aware rendering for both LLM and Embedding instances.
2385-2387: LGTM: Proper use of syncEmbeddingFromLLM.The call to
syncEmbeddingFromLLMcorrectly propagates the LLM instance configuration (including authentication settings) to the embedding instance when users choose to use the same host. This matches the function's intended behavior as documented in the utility module.
642-680: Consider passing authentication token in model loading requests.The
loadAvailableModelsfunction fetches models from Ollama instances but doesn't include authentication tokens in the request. If an instance requires authentication, the model loading will fail silently. Verify whether the backend/api/ollama/modelsendpoint requires authentication headers and whether the frontend should pass tokens for authenticated instances.
The existing spinner was already properly designed. No UI changes needed - backend now uses the existing setting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive E2E tests for the RAG workflow: - Complete flow test: clean slate, upload PDF, search, verify results - Tests for handling empty search results - Knowledge Base UI verification tests Also adds: - Playwright configuration - npm scripts: test:e2e, test:e2e:ui, test:e2e:headed - Playwright ignores in .gitignore 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Use fileURLToPath and import.meta.url for ESM-compatible __dirname. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add headless: true to playwright config - Replace UI-based upload tests with direct API tests - Tests now verify RAG features without modifying settings - Check reranking metadata, hybrid search, code examples endpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
142-177: [Duplicate] Auth token fields still missing from interface.This issue was previously flagged but not yet addressed. The interface is missing
OLLAMA_API_MODE,OLLAMA_CHAT_AUTH_TOKEN, andOLLAMA_EMBEDDING_AUTH_TOKENfields, forcing type casts at multiple locations (lines 232, 250, 255, 273, 803, 828, 2432, 2522).Based on coding guidelines, TypeScript strict mode should be used with no implicit any. Add the missing fields to restore type safety.
2393-2416: [Duplicate] Authentication UI is duplicated between modals.This duplication was previously flagged. The authentication UI block (checkbox + conditional password input) appears identically in both LLM and Embedding modals, differing only in accent color and state handlers.
As previously suggested, extract into a reusable
AuthenticationInputcomponent or use the Radix UI primitives fromsrc/features/ui/primitives/per project learnings.Based on learnings, prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components.
Also applies to: 2483-2506
🧹 Nitpick comments (4)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
682-693: Consider debouncing auto-load to avoid excessive API calls.The effects trigger
loadAvailableModelson every URL change, which could cause excessive API calls while users are typing URLs.Consider adding a debounce (e.g., 500ms) or moving the auto-load to only trigger after successful connection tests.
archon-ui-main/playwright.config.ts (1)
1-63: Consider making headless mode configurable for local debugging.The config sets
headless: trueunconditionally (line 36), which may hinder local debugging. Consider using an environment variable:- headless: true, + headless: process.env.CI ? true : false,This allows developers to see the browser during local test development while keeping CI runs headless.
archon-ui-main/tests/e2e/rag-search-flow.spec.ts (2)
20-32: Consider more specific selector for Knowledge button.The selector
button:has-text("Knowledge")with.first()might match other buttons containing "Knowledge" text. Consider a more specific selector like:const addButton = page.locator('button:has-text("+ Knowledge")');This ensures you're testing the correct button without relying on DOM order.
36-36: Consider extracting API base URL to configuration.The tests hardcode
http://localhost:8181in multiple places. Consider extracting to a constant or reading from environment:const API_BASE_URL = process.env.API_URL || 'http://localhost:8181';Then use:
await request.post(`${API_BASE_URL}/api/rag/query`, {This improves maintainability and enables testing against different environments.
Also applies to: 51-51, 73-73, 88-88, 104-104
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
archon-ui-main/.gitignore(1 hunks)archon-ui-main/package.json(2 hunks)archon-ui-main/playwright.config.ts(1 hunks)archon-ui-main/src/components/settings/RAGSettings.tsx(21 hunks)archon-ui-main/tests/e2e/rag-search-flow.spec.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- archon-ui-main/.gitignore
- archon-ui-main/package.json
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with standard React rules for legacy code in the frontend
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (10)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{test,spec}.{ts,tsx} : Use Vitest with React Testing Library for frontend testing
Applied to files:
archon-ui-main/tests/e2e/rag-search-flow.spec.tsarchon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/**/*.test.{ts,tsx} : Use Vitest with React Testing Library for frontend testing
Applied to files:
archon-ui-main/tests/e2e/rag-search-flow.spec.tsarchon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Create new UI components in the features directory with types in src/features/[feature]/types/, components in src/features/[feature]/components/, and use TanStack Query hooks
Applied to files:
archon-ui-main/tests/e2e/rag-search-flow.spec.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory
Applied to files:
archon-ui-main/playwright.config.ts
📚 Learning: 2025-09-04T16:30:05.227Z
Learnt from: stevepresley
Repo: coleam00/Archon PR: 573
File: archon-ui-main/src/config/api.ts:15-25
Timestamp: 2025-09-04T16:30:05.227Z
Learning: Archon UI API config: Prefer lazy getters getApiFullUrl() and getWsUrl() over module-load constants to avoid SSR/test crashes. Avoid CommonJS exports patterns (Object.defineProperty(exports,…)) in ESM. Add typeof window guards with VITE_API_URL fallback inside getApiUrl()/getWebSocketUrl() when SSR safety is required.
Applied to files:
archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Use Biome for code formatting in /src/features with 120 character lines, double quotes, and trailing commas
Applied to files:
archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/!(features)/**/*.{ts,tsx} : Use ESLint with standard React rules for legacy code in the frontend
Applied to files:
archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx,js,jsx} : Use ESLint for legacy frontend code outside /src/features directory with standard React rules
Applied to files:
archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.
Applied to files:
archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
syncEmbeddingFromLLM(18-28)archon-ui-main/src/features/ui/primitives/input.tsx (1)
Input(8-29)
🔇 Additional comments (15)
archon-ui-main/src/components/settings/RAGSettings.tsx (13)
210-222: LGTM! Auth state initialization is correct.The authentication fields (
useAuthandauthToken) are properly initialized with safe defaults (disabled auth, empty token).
226-273: Auth token sync effects are correctly implemented.The effects properly:
- Track auth tokens via refs to prevent infinite loops
- Derive
useAuthfrom token presence- Include auth tokens in dependency arrays
The type safety issues with
(ragSettings as any)casts have been flagged separately.
573-585: LGTM! Enhanced metrics state to track available models.The addition of
llmAvailableModelsandembeddingAvailableModelsarrays properly extends the metrics tracking to store actual model lists alongside counts.
642-680: Well-designed model loading function.The
loadAvailableModelsfunction correctly:
- Uses context parameter to differentiate chat vs embedding models
- Leverages backend proxy to avoid CORS
- Handles errors gracefully without blocking UI
753-756: LGTM! Model loading after health check is a good enhancement.Loading available models after a successful connection test ensures the UI has fresh model data when the instance is confirmed online.
788-836: Deletion handlers correctly reset authentication state.Both handlers properly:
- Reset
useAuthandauthTokento safe defaults- Clear auth tokens from ragSettings
- Maintain consistency between local state and persisted settings
892-910: LGTM! Metrics update preserves existing state.The update correctly uses the spread pattern to preserve
llmAvailableModelsandembeddingAvailableModels, which are managed separately byloadAvailableModels.
2091-2130: LGTM! Reranking model selection properly implemented.The UI correctly:
- Shows model dropdown only when reranking is enabled
- Provides four model options matching PR objectives
- Uses sensible default (MiniLM-L6 - fast)
- Includes helpful descriptions for users
1603-1660: LGTM! API mode selection UI is clear and well-designed.The interface clearly differentiates between Native Ollama API and OpenAI-Compatible modes with helpful endpoint information. The radio-button-style UI is intuitive.
1853-1880: LGTM! Authentication status display is clear and informative.The table row effectively communicates authentication status with:
- Lock icon for secured instances
- Clear "Token configured" vs "No authentication" messages
- Proper conditional rendering based on token presence
2382-2383: LGTM! Sync utility correctly used for single-host setup.The
syncEmbeddingFromLLMcall properly synchronizes all LLM instance configuration (URL, name, auth) to the embedding instance when users enable the convenience checkbox.
2428-2456: Save handlers correctly persist authentication state.Both handlers properly:
- Conditionally save auth tokens based on
useAuthflag- Clear tokens with empty string when auth is disabled
- Auto-test connection after saving (good UX)
- Refresh metrics after successful save
Also applies to: 2518-2546
1897-1906: LGTM! Model display in summary table is clear.The selected model row properly displays the current model with a graceful "Not selected" fallback for unconfigured instances.
archon-ui-main/playwright.config.ts (1)
49-54: LGTM! Playwright configuration is well-structured.The configuration provides sensible defaults:
- Appropriate timeouts (60s tests, 10s assertions)
- CI-specific optimizations (sequential execution, retries)
- Efficient artifact collection (traces/videos on retries only)
archon-ui-main/tests/e2e/rag-search-flow.spec.ts (1)
34-112: LGTM! API tests are well-structured and focused.The tests correctly:
- Test API endpoints directly without UI interaction
- Verify response structure without brittle content assertions
- Use conditional checks for reranking metadata (line 66)
- Maintain independence from each other
- Replace rstrip with removesuffix for URL normalization (embedding_service.py) - Fix auth token persistence: persist and restore authToken in setOllamaInstances/getOllamaInstances (credentialsService.ts) - Fix auth token propagation: pass auth_token through _enrich_model_capabilities and _get_model_details (model_discovery_service.py) - Add return_exceptions=True for batch resilience in asyncio.gather; report partial failures gracefully (embedding_service.py) - Fix code-block detection to check full text from position 0 so cross-chunk fences are honored (base_storage_service.py) - Fix temp file leak in PDF extraction: ensure cleanup runs even if write fails (document_processing.py) - Remove hardcoded absolute machine-specific paths from QUICKSTART.md - Add missing OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN to RAGSettingsProps interface and remove unsafe type casts (RAGSettings.tsx) - Extract duplicated authentication UI into reusable AuthenticationInput component (RAGSettings.tsx) - Replace arbitrary waitForTimeout calls with deterministic locator-based waits in E2E tests (ollama-api-mode.spec.ts) - Add missing OCR dependencies (pymupdf4llm, pytesseract, pdf2image) to pyproject.toml `all` group Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)
2465-2486:⚠️ Potential issue | 🟠 MajorThese post-edit connection checks still run with the old persisted token.
The modal “Save Changes” handlers only update local React state, then immediately call the backend health/model routes. Those routes look up auth from stored RAG credentials, so a newly entered Ollama token is not used until the top-level “Save Settings” persists it. Protected instances will therefore look offline or empty right after editing. Either persist first or pass the unsaved token through the test request.
Also applies to: 2540-2561
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 2465 - 2486, The post-edit health/model checks use persisted credentials, so when editing a protected Ollama instance the new token isn't used; update the handlers in the LLM edit flow (where ragSettings is updated via setRagSettings and setShowEditLLMModal is closed) to either persist the new token before calling manualTestConnection/fetchOllamaMetrics or explicitly pass the unsaved token from llmInstanceConfig into those calls; specifically modify the manualTestConnection invocation and any fetchOllamaMetrics call to accept and use llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' (or call the same persistence routine used by the top-level Save Settings) so setOllamaManualConfirmed and setOllamaServerStatus reflect the updated credentials immediately.
🧹 Nitpick comments (1)
python/src/server/services/storage/base_storage_service.py (1)
81-83: Avoid rescanning the full document prefix for each breakpoint check.
text[:abs_pos].count("```")copies/scans from byte 0 on every probe, sosmart_chunk_textbecomes quadratic as documents grow. Since this runs on the ingestion path, large PDFs can spend a lot of CPU here. Precompute fence offsets/parity once, or carry the fence state forward between chunks, instead of recounting from the start each time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/src/server/services/storage/base_storage_service.py` around lines 81 - 83, The current is_inside_code_block(text_segment: str, local_pos: int) implementation repeatedly scans text[:abs_pos].count("```"), making smart_chunk_text quadratic; instead, precompute fence positions or a parity map once and consult it per probe: e.g., compute all fence_offsets = [indices of "```"] or build a prefix boolean/bitset indicating whether each index is inside a fence, then change is_inside_code_block to compute abs_pos = start + local_pos and determine parity by binary-searching fence_offsets or looking up the precomputed parity rather than rescanning the prefix; alternatively, if smart_chunk_text iterates sequentially, carry a fence_state boolean forward and update it when crossing a fence so is_inside_code_block becomes O(1) per check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 285-293: The code paths that update lastLLMConfigRef and call
setLLMInstanceConfig are incorrectly preserving the previous auth token by using
`authToken: newAuthToken || prev.authToken`; change these updates (in the block
around lastLLMConfigRef.current and the similar block at the other occurrence)
to assign `authToken` directly from `newAuthToken` (i.e., `authToken:
newAuthToken`) so clearing the token (`""`) actually clears the UI state and
prevents accidentally re-persisting the old secret; update the corresponding
`useAuth` calculation to still be `!!newAuthToken`.
In `@archon-ui-main/src/services/credentialsService.ts`:
- Around line 19-27: getRagSettings() is only rehydrating keys that exist in the
settings default object and the string-field whitelist, so newly added RAG keys
(RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN) get
persisted by the UI but dropped on reload; update the settings default object
and the string-field whitelist used by getRagSettings() (and the equivalent
whitelists referenced around getRagSettings() usage at the other similar blocks)
to include RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and
OLLAMA_EMBEDDING_AUTH_TOKEN so those keys are recognized during rehydration and
survive reloads.
In `@python/pyproject.toml`:
- Around line 51-53: The new Python deps ("pdf2image", "pytesseract") require
system binaries that are not installed by python/Dockerfile.server; update
python/Dockerfile.server to install the required OS packages (e.g.,
poppler-utils and tesseract-ocr, and optionally libtesseract-dev) via apt-get
(RUN apt-get update && apt-get install -y poppler-utils tesseract-ocr) so
pdf2image and pytesseract work inside the container, then rebuild the server
image.
In `@python/src/server/services/embeddings/embedding_service.py`:
- Around line 130-160: The current loop in create_embeddings_batch (using
results, embeddings, errors) collapses out failed items and returns a shorter
embeddings list which misaligns with input batch indices; instead preserve
positional mapping by returning a list whose length equals len(results) where
each index contains the embedding vector or None (or an explicit per-index
success/failure marker), keep collecting errors in errors as you already do, and
keep the existing "all failed" EmbeddingAPIError behavior when embeddings are
all missing; update the return to deliver this per-index list (and/or a (list,
errors) tuple) so callers can map embeddings back to the original batch
positions without corrupting data.
In `@python/src/server/services/ollama/model_discovery_service.py`:
- Around line 122-129: The discovery and health caches are currently keyed only
by instance_url and therefore can return anonymous/truncated results for
authenticated requests; update discover_models (and the health-check methods
referenced) to include the auth context in the cache key (e.g., (instance_url,
auth_token or normalized auth_context)) or skip caching when auth_token is
present. Locate the cache lookup/insert usage in discover_models and the health
cache code paths and change the key construction to incorporate the token or
other auth context (or add a short-circuit that bypasses cache when auth_token
is non-null) so authenticated and anonymous responses are never conflated.
In `@python/src/server/utils/document_processing.py`:
- Around line 325-342: The OCR branch masks runtime failures because
extract_text_with_ocr() returns None for both execution errors and true no-text
results; update extract_text_with_ocr to raise a specific exception (e.g.,
OCRRuntimeError) on OCR execution failures and return an empty string "" when
OCR ran successfully but found no text, then change the code in this block
(which calls is_ocr_available(), extract_text_with_ocr, logger, and raises
ValueError) to catch OCRRuntimeError and log/raise an OCR-specific error
including the exception details (logger.error(..., exc_info=True) or raise
RuntimeError("OCR failed: ...") ), while retaining the existing ValueError path
for the empty-string/no-text case so infrastructure failures and content issues
are distinguishable.
---
Outside diff comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 2465-2486: The post-edit health/model checks use persisted
credentials, so when editing a protected Ollama instance the new token isn't
used; update the handlers in the LLM edit flow (where ragSettings is updated via
setRagSettings and setShowEditLLMModal is closed) to either persist the new
token before calling manualTestConnection/fetchOllamaMetrics or explicitly pass
the unsaved token from llmInstanceConfig into those calls; specifically modify
the manualTestConnection invocation and any fetchOllamaMetrics call to accept
and use llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' (or call
the same persistence routine used by the top-level Save Settings) so
setOllamaManualConfirmed and setOllamaServerStatus reflect the updated
credentials immediately.
---
Nitpick comments:
In `@python/src/server/services/storage/base_storage_service.py`:
- Around line 81-83: The current is_inside_code_block(text_segment: str,
local_pos: int) implementation repeatedly scans text[:abs_pos].count("```"),
making smart_chunk_text quadratic; instead, precompute fence positions or a
parity map once and consult it per probe: e.g., compute all fence_offsets =
[indices of "```"] or build a prefix boolean/bitset indicating whether each
index is inside a fence, then change is_inside_code_block to compute abs_pos =
start + local_pos and determine parity by binary-searching fence_offsets or
looking up the precomputed parity rather than rescanning the prefix;
alternatively, if smart_chunk_text iterates sequentially, carry a fence_state
boolean forward and update it when crossing a fence so is_inside_code_block
becomes O(1) per check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 140a7298-9787-435a-aff3-2910f8815ccf
📒 Files selected for processing (9)
QUICKSTART.mdarchon-ui-main/src/components/settings/RAGSettings.tsxarchon-ui-main/src/services/credentialsService.tsarchon-ui-main/tests/e2e/ollama-api-mode.spec.tspython/pyproject.tomlpython/src/server/services/embeddings/embedding_service.pypython/src/server/services/ollama/model_discovery_service.pypython/src/server/services/storage/base_storage_service.pypython/src/server/utils/document_processing.py
🚧 Files skipped from review as they are similar to previous changes (1)
- archon-ui-main/tests/e2e/ollama-api-mode.spec.ts
| if (newLLMUrl !== lastLLMConfigRef.current.url || newLLMName !== lastLLMConfigRef.current.name || newAuthToken !== lastLLMConfigRef.current.authToken) { | ||
| lastLLMConfigRef.current = { url: newLLMUrl, name: newLLMName, authToken: newAuthToken }; | ||
| setLLMInstanceConfig(prev => { | ||
| const newConfig = { | ||
| url: newLLMUrl || prev.url, | ||
| name: newLLMName || prev.name | ||
| name: newLLMName || prev.name, | ||
| useAuth: !!newAuthToken, | ||
| authToken: newAuthToken || prev.authToken | ||
| }; |
There was a problem hiding this comment.
Clearing the saved token does not actually clear local state.
authToken: newAuthToken || prev.authToken keeps the previous secret whenever the persisted value becomes "". After disabling auth, the UI state still holds the old token and a later save can write it back. This should assign newAuthToken directly.
Suggested fix
setLLMInstanceConfig(prev => {
const newConfig = {
url: newLLMUrl || prev.url,
name: newLLMName || prev.name,
useAuth: !!newAuthToken,
- authToken: newAuthToken || prev.authToken
+ authToken: newAuthToken
}; setEmbeddingInstanceConfig(prev => {
const newConfig = {
url: newEmbeddingUrl || prev.url,
name: newEmbeddingName || prev.name,
useAuth: !!newAuthToken,
- authToken: newAuthToken || prev.authToken
+ authToken: newAuthToken
};Also applies to: 308-316
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 285 -
293, The code paths that update lastLLMConfigRef and call setLLMInstanceConfig
are incorrectly preserving the previous auth token by using `authToken:
newAuthToken || prev.authToken`; change these updates (in the block around
lastLLMConfigRef.current and the similar block at the other occurrence) to
assign `authToken` directly from `newAuthToken` (i.e., `authToken:
newAuthToken`) so clearing the token (`""`) actually clears the UI state and
prevents accidentally re-persisting the old secret; update the corresponding
`useAuth` calculation to still be `!!newAuthToken`.
| RERANKING_MODEL?: string; | ||
| MODEL_CHOICE: string; | ||
| CHAT_MODEL?: string; | ||
| LLM_PROVIDER?: string; | ||
| LLM_BASE_URL?: string; | ||
| LLM_INSTANCE_NAME?: string; | ||
| OLLAMA_EMBEDDING_URL?: string; | ||
| OLLAMA_EMBEDDING_INSTANCE_NAME?: string; | ||
| OLLAMA_API_MODE?: string; |
There was a problem hiding this comment.
New RAG settings still won't survive a reload.
getRagSettings() only rehydrates keys that already exist in the settings default object and are in the string-field whitelist. Right now RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN are missing from that path, so those values can be saved by the UI but will be dropped the next time settings are loaded.
Also applies to: 196-209, 234-247
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@archon-ui-main/src/services/credentialsService.ts` around lines 19 - 27,
getRagSettings() is only rehydrating keys that exist in the settings default
object and the string-field whitelist, so newly added RAG keys (RERANKING_MODEL,
OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN) get persisted by the UI but
dropped on reload; update the settings default object and the string-field
whitelist used by getRagSettings() (and the equivalent whitelists referenced
around getRagSettings() usage at the other similar blocks) to include
RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN so
those keys are recognized during rehydration and survive reloads.
| "pymupdf4llm>=0.2.5", # Better PDF-to-Markdown extraction | ||
| "pytesseract>=0.3.10", # OCR for scanned/image-based PDFs | ||
| "pdf2image>=1.17.0", # PDF to image conversion for OCR |
There was a problem hiding this comment.
OCR won't work in the server image with these deps alone.
pdf2image and pytesseract still need the system binaries from poppler-utils and tesseract-ocr. python/Dockerfile.server:1-80 currently installs neither, so this dependency change still leaves the OCR fallback broken at runtime inside the container.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/pyproject.toml` around lines 51 - 53, The new Python deps
("pdf2image", "pytesseract") require system binaries that are not installed by
python/Dockerfile.server; update python/Dockerfile.server to install the
required OS packages (e.g., poppler-utils and tesseract-ocr, and optionally
libtesseract-dev) via apt-get (RUN apt-get update && apt-get install -y
poppler-utils tesseract-ocr) so pdf2image and pytesseract work inside the
container, then rebuild the server image.
| # Separate successes from failures for partial-failure resilience | ||
| embeddings: list[list[float]] = [] | ||
| errors: list[tuple[int, BaseException]] = [] | ||
| for i, result in enumerate(results): | ||
| if isinstance(result, BaseException): | ||
| errors.append((i, result)) | ||
| search_logger.error( | ||
| f"Embedding failed for text[{i}]: {result}", exc_info=False | ||
| ) | ||
| else: | ||
| embeddings.append(result) | ||
|
|
||
| if errors and not embeddings: | ||
| # All requests failed – raise using the first error | ||
| first_error = errors[0][1] | ||
| if isinstance(first_error, httpx.HTTPStatusError): | ||
| error_content = first_error.response.text | ||
| raise EmbeddingAPIError( | ||
| f"Ollama native API error: {first_error.response.status_code} - {error_content}", | ||
| original_error=first_error, | ||
| ) from first_error | ||
| raise EmbeddingAPIError( | ||
| f"Ollama native API error: {str(first_error)}", original_error=first_error | ||
| ) from first_error | ||
|
|
||
| if errors: | ||
| search_logger.warning( | ||
| f"{len(errors)}/{len(results)} embedding requests failed; returning {len(embeddings)} successful embeddings" | ||
| ) | ||
|
|
||
| return embeddings |
There was a problem hiding this comment.
Don't collapse failed items out of this return value.
create_embeddings_batch() still assumes the adapter returns one embedding per input text and zips that list back onto batch positionally. If text[0] fails and text[1] succeeds, returning only the successful vectors shifts text[1]'s embedding onto text[0], which corrupts stored data. Return per-index success/failure information instead of a shortened embedding list.
As per coding guidelines, "Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/services/embeddings/embedding_service.py` around lines 130
- 160, The current loop in create_embeddings_batch (using results, embeddings,
errors) collapses out failed items and returns a shorter embeddings list which
misaligns with input batch indices; instead preserve positional mapping by
returning a list whose length equals len(results) where each index contains the
embedding vector or None (or an explicit per-index success/failure marker), keep
collecting errors in errors as you already do, and keep the existing "all
failed" EmbeddingAPIError behavior when embeddings are all missing; update the
return to deliver this per-index list (and/or a (list, errors) tuple) so callers
can map embeddings back to the original batch positions without corrupting data.
| async def discover_models(self, instance_url: str, fetch_details: bool = False, auth_token: str | None = None) -> list[OllamaModel]: | ||
| """ | ||
| Discover all available models from an Ollama instance. | ||
|
|
||
| Args: | ||
| instance_url: Base URL of the Ollama instance | ||
| fetch_details: If True, fetch comprehensive model details via /api/show | ||
| auth_token: Optional authentication token for protected instances |
There was a problem hiding this comment.
Cache keys need to include auth context now.
These methods are now auth-aware, but the discovery and health caches are still keyed only by instance_url. An anonymous 401 or truncated public model list can therefore be cached and then reused for an authenticated call to the same host for the next 30s/5m. Cache by (instance_url, auth context) or bypass cache when a token is involved.
Also applies to: 974-980, 1036-1068
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/services/ollama/model_discovery_service.py` around lines
122 - 129, The discovery and health caches are currently keyed only by
instance_url and therefore can return anonymous/truncated results for
authenticated requests; update discover_models (and the health-check methods
referenced) to include the auth context in the cache key (e.g., (instance_url,
auth_token or normalized auth_context)) or skip caching when auth_token is
present. Locate the cache lookup/insert usage in discover_models and the health
cache code paths and change the key construction to incorporate the token or
other auth context (or add a short-circuit that bypasses cache when auth_token
is non-null) so authenticated and anonymous responses are never conflated.
| # Final fallback: OCR for image-based/scanned PDFs | ||
| if is_ocr_available(): | ||
| logger.info("No text extracted - attempting OCR for image-based PDF") | ||
| ocr_text = extract_text_with_ocr(file_content) | ||
| if ocr_text and len(ocr_text.strip()) > 50: | ||
| logger.info(f"PDF extracted with OCR (Tesseract): {len(ocr_text)} chars") | ||
| return ocr_text | ||
| else: | ||
| raise ValueError( | ||
| "No text extracted from PDF: OCR found no readable text. " | ||
| "File may be empty or contain only images without text." | ||
| ) | ||
| else: | ||
| raise ValueError( | ||
| "No text extracted from PDF: file appears to be images-only or scanned. " | ||
| "Install OCR dependencies for scanned PDF support: " | ||
| "pip install pytesseract pdf2image (and install tesseract + poppler)" | ||
| ) |
There was a problem hiding this comment.
This masks OCR runtime failures as “no readable text.”
extract_text_with_ocr() returns None for most OCR execution failures as well as true no-text cases. In this branch that means missing tesseract/poppler binaries or other OCR errors get reported as “file may be empty,” which makes infrastructure problems indistinguishable from bad input. Please surface an OCR-specific failure here instead of treating every None as a content issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@python/src/server/utils/document_processing.py` around lines 325 - 342, The
OCR branch masks runtime failures because extract_text_with_ocr() returns None
for both execution errors and true no-text results; update extract_text_with_ocr
to raise a specific exception (e.g., OCRRuntimeError) on OCR execution failures
and return an empty string "" when OCR ran successfully but found no text, then
change the code in this block (which calls is_ocr_available(),
extract_text_with_ocr, logger, and raises ValueError) to catch OCRRuntimeError
and log/raise an OCR-specific error including the exception details
(logger.error(..., exc_info=True) or raise RuntimeError("OCR failed: ...") ),
while retaining the existing ValueError path for the empty-string/no-text case
so infrastructure failures and content issues are distinguishable.
Summary
RAG Performance Optimization with three improvements:
Problem
Changes
Phase 1: RRF for Hybrid Search
ROW_NUMBER()+ RRF formula:1/(rank + 60)embedding_model_filterparameterPhase 2: Reranking Model Selection
RERANKING_MODELdropdown in RAGSettingsPhase 3: Parallel Contextual Embeddings
asyncio.gather()Semaphore(max_workers)to limit concurrent API callsTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Testing
Documentation
Chores