Skip to content

feat(search): Implement Reciprocal Rank Fusion (RRF) for hybrid search#896

Open
Milofax wants to merge 26 commits intocoleam00:mainfrom
Milofax:feature/rag-performance
Open

feat(search): Implement Reciprocal Rank Fusion (RRF) for hybrid search#896
Milofax wants to merge 26 commits intocoleam00:mainfrom
Milofax:feature/rag-performance

Conversation

@Milofax
Copy link

@Milofax Milofax commented Dec 3, 2025

Summary

RAG Performance Optimization with three improvements:

  1. Hybrid Search RRF - Fix broken score comparison by implementing Reciprocal Rank Fusion
  2. Reranking Model Selection - UI dropdown to choose between different reranking models
  3. Parallel Contextual Embeddings - MAX_WORKERS slider now actually parallelizes API calls

Problem

Issue Before After
Hybrid Search Compared cosine (0-1) with ts_rank (different scale) RRF uses ranks, scale-independent
Reranking Model Hardcoded, no choice Dropdown: MiniLM-L6/L12, BGE Base/Large
Contextual Embeddings Sequential processing, MAX_WORKERS did nothing Parallel with asyncio.gather + Semaphore

Changes

Phase 1: RRF for Hybrid Search

  • SQL functions use ROW_NUMBER() + RRF formula: 1/(rank + 60)
  • Added embedding_model_filter parameter
  • Fetch 3x candidates for better fusion quality

Phase 2: Reranking Model Selection

  • Add RERANKING_MODEL dropdown in RAGSettings
  • Options: MiniLM-L6 (fast), MiniLM-L12, BGE Base, BGE Large (best)
  • Backend already supported this via settings

Phase 3: Parallel Contextual Embeddings

  • Replace sequential for-loop with asyncio.gather()
  • Use Semaphore(max_workers) to limit concurrent API calls
  • Expected 2-5x speedup depending on MAX_WORKERS setting

Test plan

  • All 733 Python tests pass
  • Migration runs successfully on local Supabase
  • Hybrid search functions created with correct signatures
  • Manual test: Search with hybrid mode enabled
  • Manual test: Try different reranking models
  • Manual test: Upload documents with contextual embeddings + different MAX_WORKERS values

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-instance Ollama auth tokens and selectable API mode; Reciprocal Rank Fusion hybrid search with multi-dimensional embedding support; OCR fallback for image PDFs.
  • Improvements

    • Automatic restart policy for local setup; parallel embedding processing; improved PDF/code-block handling; per-instance model/health visibility.
  • Testing

    • New unit and end-to-end tests for Ollama auth, API mode, and RAG flows.
  • Documentation

    • Added INFRASTRUCTURE, QUICKSTART, and PLAN guides.
  • Chores

    • Updated .gitignore, test tooling and scripts.

Milofax and others added 19 commits November 21, 2025 19:50
- Implement Ollama integration for LLM and embeddings
- Add OllamaConfigurationPanel for UI configuration
- Add model discovery service for automatic model detection
- Update RAG settings to support Ollama embeddings
- Add documentation (INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md)
- Add E2E tests for Ollama functionality

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
…if block

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Add test for Ollama with auth token
- Add test for Ollama without auth token (local setup)
- Update fallback test to expect 'required-but-ignored' placeholder
- Add RAG settings mock to multi-provider sequence test

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Security fixes:
- Remove hardcoded JWT token from INFRASTRUCTURE.md (replace with placeholder)
- Remove real Ollama auth tokens from PLAN.md (replace with placeholders)
- Fix secrets leak: Change RAG settings logging from info to debug level, log only keys

Functional fixes:
- Fix URL normalization for auth-token mapping (both incoming and configured URLs now use same normalization)

Best practices:
- Add test-results/ to .gitignore
- Remove test-results from version control

All 22 Python tests pass, server healthy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
PR coleam00#875: Ollama Support with Local Model Discovery
- Security fixes: JWT tokens, auth tokens, RAG settings logging
- Bug fixes: URL normalization, None-handling
- Best practices: test-results gitignore

All 7 items marked as ERLEDIGT (2025-11-21)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add auth_token parameter to validate_provider_instance() function
- Load auth tokens from RAG settings in /validate endpoint
- Match incoming URLs against configured LLM_BASE_URL and OLLAMA_EMBEDDING_URL
- Apply correct auth token (chat or embedding) based on URL matching
- Add tests for auth token flow through all layers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
When "Use same host for embedding instance" is checked, now also copies
useAuth and authToken from LLM config to embedding config, not just the
URL and name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract syncEmbeddingFromLLM() into testable utility function
- Add 6 tests covering: full sync, default name, custom default,
  auth settings copy, empty token handling, immutability
- Refactor RAGSettings to use the utility function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Merges OpenRouter embedding provider feature from upstream.
Resolves conflict in llm_provider_service.py by keeping auth_token parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses CodeRabbit review: CHAT_MODEL was undefined because
MODEL_CHOICE is the actual key used. Now uses helper functions
that correctly resolve the displayed model name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add useAuth/authToken to instance config resets (RAGSettings.tsx)
- Clear auth tokens from ragSettings when deleting instances
- Fix type hint: dict[str, any] -> dict[str, Any] (llm_provider_service.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- ArchonChatPanel: Remove BookOpen, Search
- BugReportModal: Remove ExternalLink
- ErrorBoundaryWithBugReport: Remove React (use named imports)
- CodeViewerModal: Remove FileText
- DisconnectScreenOverlay: Remove Wifi, WifiOff
- APIKeysSection: Remove Key, Input

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add pymupdf4llm>=0.2.5 as primary PDF extraction library
- Produces clean Markdown with proper word separation
- Falls back to pdfplumber, then PyPDF2 if needed
- Validated: pymupdf4llm extracts "Since 1996, Bloomberg" vs
  pdfplumber's "Since1996,Bloomberg" (no spaces)

Phase0 validated, Phase1 complete. Phase2 (OCR) pending.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…de block protection

- Add heading boundary (\n#) as highest priority breakpoint
- Protect code blocks from being split mid-block
- Add is_inside_code_block() helper to check if position is within ```...```
- Only break at paragraphs/sentences when outside code blocks

This improves RAG quality for Markdown documents (especially from pymupdf4llm)
by preserving semantic structure during chunking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…dels

BREAKING: Search now supports all embedding dimensions (384, 768, 1024, 1536, 3072)

Previously search was hardcoded to use 1536-dim functions, which meant:
- snowflake-arctic-embed2 (1024-dim) → NO RESULTS
- nomic-embed-text (768-dim) → NO RESULTS
- Only OpenAI text-embedding-3-small (1536-dim) worked

Fix:
- Detect dimension from query_embedding: len(query_embedding)
- Use _multi RPC functions with embedding_dimension parameter
- base_search_strategy.py: match_archon_*_multi
- hybrid_search_strategy.py: hybrid_search_*_multi

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds automatic OCR extraction for scanned/image-based PDFs that don't
contain extractable text layers.

New files:
- ocr_processing.py: Tesseract OCR engine wrapper
  - extract_text_with_ocr(): Convert PDF pages to images, run OCR
  - is_ocr_available(): Check for pytesseract/pdf2image
  - get_supported_languages(): List installed Tesseract languages

Changes:
- document_processing.py: OCR fallback when text extraction fails
- pyproject.toml: Add pytesseract>=0.3.10, pdf2image>=1.17.0

Flow: pymupdf4llm → pdfplumber → PyPDF2 → Tesseract OCR

Requires system dependencies:
- macOS: brew install tesseract poppler
- Linux: apt install tesseract-ocr poppler-utils

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace direct score comparison with RRF scoring for proper fusion of
vector and text search results. Previous implementation incorrectly
compared cosine similarity (0-1) with ts_rank_cd scores (different scale).

Changes:
- SQL: Use ROW_NUMBER() + RRF formula (1/(rank+60)) instead of raw scores
- SQL: Add embedding_model_filter parameter for future model switching
- SQL: Fetch 3x candidates for better fusion quality before limiting
- Test: Update RPC name assertion to _multi version

RRF is scale-independent and works across domains without tuning (k=60).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-instance Ollama authentication (frontend + backend), RRF-based multi-dimensional hybrid search and migrations, OCR-backed PDF extraction, parallelized embedding sub-batch processing, Playwright E2E tests and infra docs, plus various UI/type/test/compose/.gitignore cleanups.

Changes

Cohort / File(s) Summary
Gitignore & Compose
\.gitignore, archon-ui-main/.gitignore, docker-compose.yml
Ignore Supabase temp/volumes and test/playwright artifacts; add restart: unless-stopped to core services.
Docs & Guides
INFRASTRUCTURE.md, QUICKSTART.md, PLAN.md
Add infrastructure docs, bilingual quickstart, and rollout/implementation plan for per-instance Ollama auth.
Frontend — RAG & Ollama UI/Types
archon-ui-main/src/components/settings/RAGSettings.tsx, .../OllamaConfigurationPanel.tsx, .../types/OllamaTypes.ts, archon-ui-main/src/services/credentialsService.ts
Add per-instance useAuth/authToken, Ollama API mode, persist chat/embedding tokens and API mode; extend types and credentials mappings; update UI/state and save/load flows.
Frontend — Utils & Unit Tests
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts, .../__tests__/instanceConfigSync.test.ts
New InstanceConfig interface and syncEmbeddingFromLLM util with unit tests to mirror LLM → embedding settings including auth fields.
Frontend — E2E & Playwright
archon-ui-main/package.json, archon-ui-main/playwright.config.ts, archon-ui-main/tests/e2e/*
Add Playwright devDependency/config and new E2E suites for Ollama API mode and RAG search flow.
Frontend — Cleanup
multiple files: archon-ui-main/src/components/...
Remove several unused imports and adjust React import usage (cosmetic).
Backend — Ollama API & Discovery
python/src/server/api_routes/ollama_api.py, python/src/server/services/ollama/model_discovery_service.py, python/src/server/services/llm_provider_service.py
Propagate per-URL auth tokens through health checks, model discovery and provider validation; validate_provider_instance accepts auth_token.
Backend — Embeddings
python/src/server/services/embeddings/embedding_service.py
Add NativeOllamaEmbeddingAdapter, select adapter by base_url/api_mode/auth_token, and propagate Ollama embedding settings into batch creation.
Backend — Hybrid Search / Migrations
migration/0.1.0/013_hybrid_search_rrf.sql, migration/complete_setup.sql, python/src/server/services/search/*
Introduce RRF-based *_multi hybrid search functions, support embedding_dimension and embedding_model_filter, and call multi-RPC variants from server.
Backend — Storage & Chunking
python/src/server/services/storage/base_storage_service.py, .../document_storage_service.py
Prefer heading-aware chunk breaks avoiding code fences; parallelize embedding sub-batches with semaphore/tasks and improve error mapping/logging.
Backend — PDF / OCR
python/src/server/utils/document_processing.py, python/src/server/utils/ocr_processing.py, python/pyproject.toml
Use pymupdf4llm primary Markdown extraction, add pdfplumber/PyPDF2 fallbacks and Tesseract OCR fallback; add OCR deps to pyproject.
Backend — Misc & Tests
python/src/agent_work_orders/utils/state_reconciliation.py, python/tests/*, python/src/server/services/storage/*
Fix indentation, add/adjust tests for Ollama auth propagation and LLM client behavior, update RPC test expectations to multi variant, and add other test coverage.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as RAGSettings/UI
    participant API as Backend API
    participant Creds as Credentials Service
    participant Ollama as Ollama Instance

    User->>Frontend: configure instance URL + enable auth + enter token
    Frontend->>API: save rag_settings (includes per-instance token)
    API->>Creds: read rag_strategy / per-URL tokens
    Creds-->>API: return token map

    rect rgba(230,244,234,0.5)
      API->>Ollama: GET /api/tags or /health (Authorization: Bearer {token})
      Ollama-->>API: models / health
      API-->>Frontend: report health & models
    end

    rect rgba(240,245,255,0.5)
      Frontend->>API: inference request (chat or embedding)
      API->>Creds: resolve operation-specific token
      Creds-->>API: return token
      API->>Ollama: POST /api/chat or /api/embeddings (Authorization: Bearer {token})
      Ollama-->>API: inference response
      API-->>Frontend: inference result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • coleam00/Archon#643 — overlaps Ollama multi-instance integration and per-instance auth wiring across frontend and backend.
  • coleam00/Archon#681 — related migration changes introducing multi-dimensional hybrid search and RRF scoring.
  • coleam00/Archon#539 — related DB-backed hybrid search adjustments and server-side hybrid_search_strategy updates.

Suggested labels

enhancement

Suggested reviewers

  • coleam00
  • leex279
  • Wirasm

Poem

🐰 I nibbled tokens, hid them neat,

per-host secrets snug and sweet.
RRF ranks hops in tidy line,
OCR reads pages, lost then fine—
Ollama listens, answers fleet.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is comprehensive and well-structured, covering the summary, problem statement, three phases of changes, and test plan. However, it does not follow the provided template structure with explicit sections like 'Type of Change', 'Affected Services', or 'Checklist'. Reformat the description to follow the repository template: add explicit 'Type of Change' checkboxes, 'Affected Services' section, 'Testing' details with test commands, and complete the provided checklist items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the primary focus of the PR—implementing Reciprocal Rank Fusion for hybrid search. It is specific, concise, and accurately reflects the main change.
Docstring Coverage ✅ Passed Docstring coverage is 88.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
archon-ui-main/src/components/settings/APIKeysSection.tsx (3)

96-96: Implicit any type violates TypeScript strict mode.

Line 96 declares value: any, which violates the "no implicit any" requirement per coding guidelines. Replace with a specific union type matching CustomCredential field values.

- const updateCredential = (index: number, field: keyof CustomCredential, value: any) => {
+ const updateCredential = (index: number, field: keyof CustomCredential, value: string | boolean) => {

Note: If other field types are needed (e.g., undefined), adjust the union accordingly.


267-271: Avoid dynamic Tailwind class construction in frontend components.

Lines 267–271 construct Tailwind classes conditionally using ternary operators. Per coding guidelines, use a classnames utility or extract static classes to avoid tooling issues.

- className={`w-full px-3 py-2 pr-20 rounded-md border text-sm ${
-   cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]'
-     ? 'bg-gray-100 dark:bg-gray-800 border-gray-200 dark:border-gray-600 text-gray-500 dark:text-gray-400'
-     : 'bg-white dark:bg-gray-900 border-gray-300 dark:border-gray-700'
- }`}
+ className={cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]'
+   ? 'w-full px-3 py-2 pr-20 rounded-md border text-sm bg-gray-100 dark:bg-gray-800 border-gray-200 dark:border-gray-600 text-gray-500 dark:text-gray-400'
+   : 'w-full px-3 py-2 pr-20 rounded-md border text-sm bg-white dark:bg-gray-900 border-gray-300 dark:border-gray-700'
+ }

Alternatively, extract the condition into a helper variable for clarity.


305-312: Avoid dynamic Tailwind class construction in encryption toggle button.

Lines 305–312 construct Tailwind classes dynamically via ternary operators. Extract static classes or use a classnames utility to comply with the coding guideline against dynamic Tailwind construction.

- className={`
-   absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors
-   ${cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]'
-     ? 'cursor-not-allowed opacity-50 text-pink-400'
-     : cred.is_encrypted 
-       ? 'text-pink-600 dark:text-pink-400 hover:bg-pink-100 dark:hover:bg-pink-900/20' 
-       : 'text-gray-400 hover:bg-gray-200 dark:hover:bg-gray-700'
-   }
- `}
+ className={
+   cred.isFromBackend && cred.is_encrypted && cred.value === '[ENCRYPTED]'
+     ? 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors cursor-not-allowed opacity-50 text-pink-400'
+     : cred.is_encrypted
+       ? 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors text-pink-600 dark:text-pink-400 hover:bg-pink-100 dark:hover:bg-pink-900/20'
+       : 'absolute right-2 top-1/2 -translate-y-1/2 p-1.5 rounded transition-colors text-gray-400 hover:bg-gray-200 dark:hover:bg-gray-700'
+ }

Or extract the condition into a helper constant for better maintainability.

python/src/server/api_routes/ollama_api.py (1)

247-256: Bug: Incorrect timestamp value - uses module name instead of actual timestamp.

Line 255 sets timestamp to model_discovery_service.check_instance_health.__module__, which returns the module name string (e.g., "src.server.services.ollama.model_discovery_service"), not a timestamp.

Apply this diff to fix the timestamp:

         return {
             "summary": {
                 "total_instances": len(instance_urls),
                 "healthy_instances": healthy_count,
                 "unhealthy_instances": len(instance_urls) - healthy_count,
                 "average_response_time_ms": avg_response_time
             },
             "instance_status": health_results,
-            "timestamp": model_discovery_service.check_instance_health.__module__  # Use current timestamp
+            "timestamp": datetime.now().isoformat()
         }

You'll also need to import datetime at the top of the file if not already imported for this endpoint.

python/src/server/services/ollama/model_discovery_service.py (1)

975-983: Cache key doesn't account for auth token - may return stale results for different auth contexts.

The health check cache uses only the instance_url as the cache key (line 976). If the same URL is accessed with different auth tokens (or with/without tokens), the cached result from one context may incorrectly be returned for another.

Consider including a hash or presence indicator of the auth token in the cache key:

-        cache_key = f"health_{instance_url}"
+        # Include token presence in cache key to avoid cross-context cache hits
+        token_indicator = "auth" if auth_token else "noauth"
+        cache_key = f"health_{instance_url}_{token_indicator}"
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

142-174: Extend interface to include auth token fields.

The RAGSettingsProps interface defines CHAT_MODEL but omits the auth token fields (OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN, OLLAMA_API_MODE) that are accessed elsewhere via as any casts. Adding them would provide type safety. As per coding guidelines, TypeScript strict mode with no implicit any is expected.

     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
+    OLLAMA_API_MODE?: 'native' | 'openai-compatible';
     // Crawling Performance Settings
python/src/server/utils/document_processing.py (1)

297-319: Inconsistent fallback threshold between pdfplumber and PyPDF2.

The pdfplumber fallback (line 289) requires >100 characters to consider extraction successful, but PyPDF2 (line 312) accepts any non-empty result. This means PyPDF2 could return just a few characters instead of falling through to OCR.

Consider applying the same threshold for consistency:

             if text_content:
                 combined_text = "\n\n".join(text_content)
+                if len(combined_text.strip()) <= 100:
+                    logfire.warning("PyPDF2 returned insufficient text, trying OCR")
+                else:
-                logger.info(f"PDF extracted with PyPDF2: {len(combined_text)} chars")
-                return _preserve_code_blocks_across_pages(combined_text)
+                    logger.info(f"PDF extracted with PyPDF2: {len(combined_text)} chars")
+                    return _preserve_code_blocks_across_pages(combined_text)
             # If no text, fall through to OCR
🧹 Nitpick comments (27)
python/src/agent_work_orders/utils/state_reconciliation.py (2)

184-200: Prefer logger.exception() in except blocks

Both exception handlers use logger.error(..., exc_info=True). To align with the logging guidelines and make intent clearer, consider switching to logger.exception(...) and dropping the explicit exc_info argument, e.g.:

-            except Exception as e:
-                actions.append(f"Failed to delete {orphan_path}: {e}")
-                logger.error("orphaned_worktree_delete_failed", path=orphan_path, error=str(e), exc_info=True)
+            except Exception as e:
+                actions.append(f"Failed to delete {orphan_path}: {e}")
+                logger.exception("orphaned_worktree_delete_failed", path=orphan_path, error=str(e))
...
-            except Exception as e:
-                actions.append(f"Failed to update {work_order_id}: {e}")
-                logger.error("dangling_state_update_failed", work_order_id=work_order_id, error=str(e), exc_info=True)
+            except Exception as e:
+                actions.append(f"Failed to update {work_order_id}: {e}")
+                logger.exception("dangling_state_update_failed", work_order_id=work_order_id, error=str(e))

As per coding guidelines, logger.exception() is the preferred pattern in except blocks.


106-125: Tighten reconcile_state return type instead of using Any

reconcile_state currently returns dict[str, Any], which conflicts with the “no Any without explicit justification” rule for agent_work_orders. Consider defining a TypedDict or dataclass for the report shape (e.g., fields for orphaned_worktrees, dangling_state, fix_applied, actions_taken) and using that as the return type so mypy can fully validate callers. This can be a follow-up to this formatting fix.

As per coding guidelines, strict typing without Any is expected in this package.

QUICKSTART.md (1)

23-28: Minor: Consider wrapping URLs in angle brackets for markdown lint compliance.

Static analysis flagged bare URLs (MD034). While functional, wrapping URLs improves markdown portability.

 | Service | URL | Beschreibung |
 |---------|-----|-------------|
-| **Archon UI** | http://localhost:3737 | Hauptanwendung |
-| **Supabase Studio** | http://localhost:54323 | Datenbank-UI |
-| **API Server** | http://localhost:8181 | Backend API |
-| **MCP Server** | http://localhost:8051/mcp | Model Context Protocol |
+| **Archon UI** | <http://localhost:3737> | Hauptanwendung |
+| **Supabase Studio** | <http://localhost:54323> | Datenbank-UI |
+| **API Server** | <http://localhost:8181> | Backend API |
+| **MCP Server** | <http://localhost:8051/mcp> | Model Context Protocol |
INFRASTRUCTURE.md (2)

78-88: Replace absolute paths with generic examples.

The documentation contains absolute paths specific to a developer's local machine (e.g., /Volumes/DATEN/Coding/archon). These should be replaced with generic placeholders or relative paths to make the documentation useful for all contributors.

Consider using placeholders like:

cd <YOUR_PROJECT_ROOT>/supabase
supabase start

Or establish a convention like $ARCHON_ROOT as an environment variable throughout the documentation.


1-544: Consider adding an English version or translation.

This comprehensive infrastructure documentation is written entirely in German, which may limit accessibility for international contributors. While maintaining the German version for your team, consider adding an English version or at least a brief English summary at the top linking to translated versions.

This would improve the project's accessibility without removing the valuable German documentation.

archon-ui-main/src/components/settings/utils/__tests__/instanceConfigSync.test.ts (1)

1-90: Comprehensive test coverage for sync utility.

The test suite provides excellent coverage of the syncEmbeddingFromLLM function, including:

  • Full property copying
  • Default name fallback behavior
  • Custom default names
  • Auth settings edge cases
  • Immutability verification

Consider adding one additional test case for defensive programming:

it('should handle null or undefined llmConfig gracefully', () => {
  // If the function should handle invalid inputs
  expect(() => syncEmbeddingFromLLM(null as any)).toThrow();
});

However, this is optional if the function is only called with validated inputs.

archon-ui-main/src/components/settings/OllamaConfigurationPanel.tsx (1)

734-754: Clean authentication UI implementation.

The Authentication Settings section is well-structured:

  • Checkbox for toggling authentication
  • Password input conditionally rendered only when authentication is enabled
  • Accessible label with proper cursor pointer
  • Good use of conditional rendering

Minor enhancement suggestion: Consider adding a visual indicator or help text explaining what the auth token is for, especially for users new to Ollama authentication.

archon-ui-main/tests/e2e/ollama-api-mode.spec.ts (2)

19-20: Consider using data-testid attributes for more stable selectors.

The tests rely heavily on text-based selectors like button:has-text("Ollama") and text=RAG Settings. While functional, these are fragile if UI text changes (e.g., for i18n or copy updates).

Consider adding data-testid attributes to key UI elements:

// In the component:
<button data-testid="ollama-card">Ollama</button>

// In the test:
await page.locator('[data-testid="ollama-card"]').click();

This provides a stable contract between tests and components that won't break due to text changes.

Also applies to: 24-25, 49-50, 54-55, 78-79, 83-84, 123-124, 128-129, 156-157, 174-175, 178-179


5-5: Externalize the base URL to test configuration.

The URL http://localhost:3737 is hardcoded, which makes it difficult to run tests in different environments.

Use Playwright's base URL configuration:

// In playwright.config.ts:
export default {
  use: {
    baseURL: process.env.BASE_URL || 'http://localhost:3737',
  },
};

// In the test:
await page.goto('/settings');
python/tests/test_ollama_auth_token.py (1)

58-64: Consider simplifying the auth_token assertion.

The assertion on lines 63-64 checks for the auth_token in both kwargs and positional args, which seems overly complex.

Since the function signature uses auth_token as a keyword argument (from the relevant code snippet), you can simplify this to:

# Verify auth_token was passed to check_instance_health
mock_model_discovery_service.check_instance_health.assert_called_once_with(
    "http://localhost:11434",
    auth_token="test-auth-token"
)

This is more readable and explicitly validates the expected call signature.

archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)

18-28: Clean and well-implemented utility function.

The syncEmbeddingFromLLM function is straightforward and correctly implements the config synchronization:

  • Returns a new object (immutability)
  • Sensible default parameter for name fallback
  • Clear documentation
  • All properties properly copied

The comprehensive test coverage in __tests__/instanceConfigSync.test.ts validates the behavior.

Optional: Consider adding JSDoc @throws if you decide to add validation in the future, or document that the function assumes valid inputs.

python/src/server/api_routes/ollama_api.py (3)

98-131: Consider extracting duplicated auth token resolution logic into a helper function.

The auth token retrieval and URL normalization pattern is repeated in three places: discover_models_endpoint (lines 99-131), health_check_endpoint (lines 181-216), and validate_instance_endpoint (lines 277-299). This duplication increases maintenance burden and risk of inconsistencies.

Consider extracting this logic:

async def _get_ollama_auth_tokens_map(instance_urls: list[str]) -> dict[str, str]:
    """Get auth token mapping for Ollama instance URLs."""
    from ..services.credential_service import credential_service
    rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
    
    llm_base_url = (rag_settings.get("LLM_BASE_URL") or "").replace("/v1", "").rstrip("/")
    embedding_base_url = (rag_settings.get("OLLAMA_EMBEDDING_URL") or "").replace("/v1", "").rstrip("/")
    chat_auth_token = rag_settings.get("OLLAMA_CHAT_AUTH_TOKEN") or ""
    embedding_auth_token = rag_settings.get("OLLAMA_EMBEDDING_AUTH_TOKEN") or ""
    
    auth_tokens_map = {}
    for url in instance_urls:
        normalized_url = url.replace("/v1", "").rstrip("/")
        if normalized_url == llm_base_url and chat_auth_token:
            auth_tokens_map[normalized_url] = chat_auth_token
        elif normalized_url == embedding_base_url and embedding_auth_token:
            auth_tokens_map[normalized_url] = embedding_auth_token
    return auth_tokens_map

549-549: Deprecated Pydantic method: .dict() is deprecated in Pydantic v2.

Line 549 uses model.dict() which is deprecated in Pydantic v2. Use model.model_dump() instead.

-        models_data = {
-            "models": [model.dict() for model in stored_models],
+        models_data = {
+            "models": [model.model_dump() for model in stored_models],

1169-1169: Deprecated Pydantic method: .dict() is deprecated in Pydantic v2.

Line 1169 uses stored_model.dict() which is deprecated in Pydantic v2.

-                            model_dict = stored_model.dict()
+                            model_dict = stored_model.model_dump()
python/src/server/services/llm_provider_service.py (1)

415-440: Consider extracting duplicated Ollama auth token resolution logic.

The auth token retrieval logic from rag_strategy settings is duplicated between the OpenAI fallback path (lines 415-440) and the direct Ollama path (lines 455-479). Both blocks:

  1. Check/populate the rag_settings cache
  2. Select token based on use_embedding_provider or instance_type
  3. Default to "required-but-ignored" if no token

Consider extracting to a helper function:

async def _get_ollama_auth_token(use_embedding: bool = False) -> str:
    """Get Ollama auth token from RAG settings, defaulting to placeholder."""
    cache_key = "rag_strategy_settings"
    rag_settings = _get_cached_settings(cache_key)
    if rag_settings is None:
        rag_settings = await credential_service.get_credentials_by_category("rag_strategy")
        _set_cached_settings(cache_key, rag_settings)
    
    token_key = "OLLAMA_EMBEDDING_AUTH_TOKEN" if use_embedding else "OLLAMA_CHAT_AUTH_TOKEN"
    token = rag_settings.get(token_key, "")
    
    if not token:
        logger.warning(f"No Ollama auth token found for {token_key}, using placeholder")
        return "required-but-ignored"
    return token

Also applies to: 455-479

PLAN.md (2)

1-3: Consider whether this planning document should be committed to the repository.

This appears to be an internal implementation plan with deployment-specific details (local paths, placeholder tokens). If this is meant for end-users or contributors, consider:

  1. Moving deployment-specific paths and tokens to environment examples
  2. Translating to English for broader accessibility

If this is a temporary development artifact, consider adding it to .gitignore or removing it before merge.


124-124: Convert bare URL to a proper Markdown link.

Static analysis flagged this as a bare URL. For better documentation formatting:

-1. **UI öffnen**: http://localhost:3737
+1. **UI öffnen**: [http://localhost:3737](http://localhost:3737)
archon-ui-main/src/components/settings/RAGSettings.tsx (3)

228-249: Replace as any type casts with properly typed interface.

The (ragSettings as any).OLLAMA_CHAT_AUTH_TOKEN pattern bypasses TypeScript's type checking. This is used in multiple locations (lines 231, 249, 254, 272) and should be addressed by extending the RAGSettingsProps interface as suggested above.

Additionally, the comparison on line 243 checks newConfig.authToken !== prev.authToken but doesn't compare useAuth state, which could lead to missed updates if the checkbox changes without the token changing.

-        if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.authToken !== prev.authToken) {
+        if (newConfig.url !== prev.url || newConfig.name !== prev.name || newConfig.useAuth !== prev.useAuth || newConfig.authToken !== prev.authToken) {

641-679: Consider typing the model response instead of using any.

The model mapping on lines 660 and 667 uses (m: any) => m.name. Consider defining a type for the model response to improve type safety:

interface OllamaModel {
  name: string;
  instance_url?: string;
  // ... other fields
}

Then use it: (m: OllamaModel) => m.name

The error handling is good - silently failing and leaving the model list empty is appropriate for non-critical functionality.


681-692: Add loadAvailableModels to useEffect dependencies or wrap with useCallback.

The loadAvailableModels function is called in these effects but not included in the dependency arrays. This triggers the react-hooks/exhaustive-deps ESLint rule (which appears suppressed elsewhere). While currently safe because loadAvailableModels doesn't capture any changing values, this pattern is fragile.

Additionally, the comparison against the hardcoded default URL is duplicated. Consider extracting to a constant:

+const isDefaultOllamaUrl = (url: string) => url === DEFAULT_OLLAMA_URL;
+
 useEffect(() => {
-  if (llmInstanceConfig.url && llmInstanceConfig.url !== 'http://host.docker.internal:11434/v1') {
+  if (llmInstanceConfig.url && !isDefaultOllamaUrl(llmInstanceConfig.url)) {
     loadAvailableModels(llmInstanceConfig.url, 'chat');
   }
-}, [llmInstanceConfig.url]);
+}, [llmInstanceConfig.url, loadAvailableModels]);
python/src/server/utils/ocr_processing.py (3)

14-15: Remove unused imports.

tempfile and os are imported but never used in this module. Ruff would flag these as unused imports.

 import io
-import tempfile
-import os
 from typing import Optional

122-124: Error logging should preserve the full stack trace.

Per coding guidelines, error logs should include exc_info=True to preserve full stack traces. The current f-string only captures the exception message, losing the traceback.

     except Exception as e:
-        logfire.error(f"OCR extraction failed: {e}")
+        logfire.error(f"OCR extraction failed: {e}", exc_info=True)
         return None

79-89: Consider adding safeguards for large PDFs.

convert_from_bytes loads all pages as images into memory simultaneously. At 300 DPI, each page could be 10-50MB. For very large scanned documents, this could cause memory exhaustion.

Consider adding a page limit or processing pages in batches:

MAX_OCR_PAGES = 50  # Configurable limit

images = convert_from_bytes(file_content, dpi=dpi)
if len(images) > MAX_OCR_PAGES:
    logger.warning(f"PDF has {len(images)} pages, processing first {MAX_OCR_PAGES} for OCR")
    images = images[:MAX_OCR_PAGES]

Alternatively, use convert_from_bytes with first_page and last_page parameters to process in chunks.

python/tests/test_rag_simple.py (1)

94-98: Strengthen RPC assertion to also validate payload

The name assertion is good, but you can cheaply guard the new multi-endpoint wiring by also checking the payload, e.g. embedding_dimension:

-        # Verify RPC was called correctly (uses _multi version for dimension support)
-        mock_supabase.rpc.assert_called_once()
-        call_args = mock_supabase.rpc.call_args[0]
-        assert call_args[0] == "match_archon_crawled_pages_multi"
+        # Verify RPC was called correctly (uses _multi version for dimension support)
+        mock_supabase.rpc.assert_called_once()
+        fn_name, params = mock_supabase.rpc.call_args[0]
+        assert fn_name == "match_archon_crawled_pages_multi"
+        assert params["embedding_dimension"] == len(query_embedding)
python/src/server/services/search/base_search_strategy.py (1)

50-76: Multi-RPC wiring looks correct; consider pre-validation and richer error logging

The shift to computing embedding_dimension = len(query_embedding) and routing to f"{table_rpc}_multi" with embedding_dimension in rpc_params is consistent with the new match_archon_*_multi SQL signatures and should work across dimensions.

Two small follow-ups you might consider:

  • Guard obvious bad inputs before going to the DB (e.g., empty query_embedding or clearly unsupported dimensions) and log a clear reason instead of relying on the SQL RAISE EXCEPTION path.
  • In the except block below, switch to logger.exception("Vector search failed", extra={...}) or logger.error(..., exc_info=True) so you preserve stack traces, in line with the backend logging guidelines.
python/src/server/services/search/hybrid_search_strategy.py (1)

57-62: Hybrid _multi RPC integration is consistent with the SQL contracts

Deriving embedding_dimension = len(query_embedding) and routing to hybrid_search_archon_crawled_pages_multi / hybrid_search_archon_code_examples_multi with query_embedding, embedding_dimension, query_text, match_count, filter, and source_filter matches the new function signatures and RRF expectations. The added debug logs with dimensionality are also helpful.

If you want to reduce duplication, you could extract a tiny helper to compute embedding_dimension and log the “X D embedding” message shared between document and code hybrid searches, but that’s purely cosmetic.

Also applies to: 149-159

migration/0.1.0/013_hybrid_search_rrf.sql (1)

17-129: RRF-based hybrid functions look correct and parameter-safe

The two new hybrid_search_archon_*_multi functions are well-structured:

  • embedding_dimensionembedding_column mapping via CASE matches the defined embedding_384…embedding_3072 columns and raises cleanly on unsupported dimensions.
  • Dynamic SQL uses %I for column identifiers and positional parameters for all data ($1..$8), avoiding SQL injection while correctly wiring query_embedding, candidate_count, match_count, filter, source_filter, query_text, embedding_model_filter, and rrf_k.
  • RRF scoring 1.0/(rank + rrf_k) summed over vector/text ranks with rrf_k := 60 is implemented correctly and matches the comments; match_type labels (hybrid/vector/keyword) will be useful to the application layer.
  • Limiting to candidate_count := LEAST(match_count * 3, 100) is a reasonable trade-off between fusion quality and performance.

Main suggestion is process-level rather than code-level: ensure this definition stays in sync with the corresponding section in complete_setup.sql so new installs and incremental migrations don’t drift over time.

Also applies to: 132-242

baseUrl: string;
isEnabled: boolean;
isPrimary: boolean;
authToken?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The authToken field is added to OllamaInstance but not persisted or loaded.

The authToken field is added to the interface (line 68) but the getOllamaInstances and setOllamaInstances methods don't handle this field. When instances are saved, the auth token won't be persisted, and when loaded, it won't be restored.

Add authToken handling in both methods:

In setOllamaInstances (around line 507):

          const fields: Record<string, any> = {
            name: instance.name,
            baseUrl: instance.baseUrl,
            isEnabled: instance.isEnabled,
            isPrimary: instance.isPrimary,
            instanceType: instance.instanceType || 'both',
            loadBalancingWeight: instance.loadBalancingWeight || 100
          };
+          
+          if (instance.authToken) {
+            fields.authToken = instance.authToken;
+          }

In getOllamaInstances, ensure the field parsing handles authToken (it may already work since strings pass through, but verify the field name mapping).

Also applies to: 506-528

🤖 Prompt for AI Agents
In archon-ui-main/src/services/credentialsService.ts around line 68 and lines
~506-528, the new authToken field on OllamaInstance is not persisted or
restored: update setOllamaInstances to include authToken when serializing
instances to storage (ensure undefined/null are handled and avoid dropping the
field), and update getOllamaInstances to parse/assign authToken back onto each
OllamaInstance when reading from storage (respect typing and provide a default
undefined if missing); ensure the stored shape and runtime mapping use the exact
field name "authToken" so tokens are saved and reloaded correctly.

Comment on lines 78 to 101
# Helper: check if position is inside a code block
def is_inside_code_block(text_segment: str, pos: int) -> bool:
return text_segment[:pos].count("```") % 2 == 1

# If no code block, try paragraph break
elif "\n\n" in chunk:
# Store original end for checking if we found a break point
original_end = end

# First, try to break at a heading boundary (# ## ### etc.)
heading_pos = chunk.rfind("\n#")
if heading_pos != -1 and heading_pos > chunk_size * 0.3:
if not is_inside_code_block(chunk, heading_pos):
end = start + heading_pos

# If no heading, try paragraph break (but not inside code blocks)
if end == original_end and "\n\n" in chunk:
last_break = chunk.rfind("\n\n")
if last_break > chunk_size * 0.3:
if last_break > chunk_size * 0.3 and not is_inside_code_block(chunk, last_break):
end = start + last_break

# If no paragraph break, try sentence break
elif ". " in chunk:
# If no paragraph break, try sentence break (but not inside code blocks)
if end == original_end and ". " in chunk:
last_period = chunk.rfind(". ")
if last_period > chunk_size * 0.3:
if last_period > chunk_size * 0.3 and not is_inside_code_block(chunk, last_period):
end = start + last_period + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Code‑block detection can be wrong across chunk boundaries

is_inside_code_block only inspects the current chunk, so if a fenced code block starts before start and continues into this chunk, positions are misclassified as “outside code,” and the heading/paragraph/sentence logic can still split inside a code block (e.g., on \n# in a Python comment within a large fenced block). This violates the intent of preserving code blocks when possible.

I’d strongly recommend basing the check on the full text and the absolute position (start + local_pos) so fences that began in earlier chunks are honored.

One way to fix this in place:

-            # Helper: check if position is inside a code block
-            def is_inside_code_block(text_segment: str, pos: int) -> bool:
-                return text_segment[:pos].count("```") % 2 == 1
+            # Helper: check if position is inside a code block in the full text
+            def is_inside_code_block(pos: int) -> bool:
+                # Use absolute position so we respect code fences that started before this chunk.
+                return text[:pos].count("```") % 2 == 1
@@
-            heading_pos = chunk.rfind("\n#")
-            if heading_pos != -1 and heading_pos > chunk_size * 0.3:
-                if not is_inside_code_block(chunk, heading_pos):
-                    end = start + heading_pos
+            heading_pos = chunk.rfind("\n#")
+            if heading_pos != -1 and heading_pos > chunk_size * 0.3:
+                if not is_inside_code_block(start + heading_pos):
+                    end = start + heading_pos
@@
-            if end == original_end and "\n\n" in chunk:
-                last_break = chunk.rfind("\n\n")
-                if last_break > chunk_size * 0.3 and not is_inside_code_block(chunk, last_break):
-                    end = start + last_break
+            if end == original_end and "\n\n" in chunk:
+                last_break = chunk.rfind("\n\n")
+                if last_break > chunk_size * 0.3 and not is_inside_code_block(start + last_break):
+                    end = start + last_break
@@
-            if end == original_end and ". " in chunk:
-                last_period = chunk.rfind(". ")
-                if last_period > chunk_size * 0.3 and not is_inside_code_block(chunk, last_period):
-                    end = start + last_period + 1
+            if end == original_end and ". " in chunk:
+                last_period = chunk.rfind(". ")
+                if last_period > chunk_size * 0.3 and not is_inside_code_block(start + last_period):
+                    end = start + last_period + 1

If performance on very large texts becomes an issue, you could later optimize by precomputing code‑fence positions once and using them to answer is_inside_code_block in O(log n) or O(1) rather than rescanning text each time.

🤖 Prompt for AI Agents
python/src/server/services/storage/base_storage_service.py around lines 78 to
101: the current is_inside_code_block helper inspects only the chunk and
misclassifies positions when a fenced code block starts before the chunk;
replace it with a helper that checks the full text using absolute positions
(i.e., is_inside_code_block(pos) -> evaluate text[:pos].count("```") % 2 == 1)
and update every call to pass start + local_pos (e.g., heading_pos, last_break,
last_period) so heading/paragraph/sentence break checks correctly respect
fences; keep existing break selection logic unchanged.

Milofax and others added 2 commits December 3, 2025 15:10
…ings

Phase 2: Reranking Model Selection
- Add RERANKING_MODEL dropdown to RAGSettings UI
- Options: MiniLM-L6 (fast), MiniLM-L12 (better), BGE Base, BGE Large
- Dropdown only visible when reranking is enabled
- Backend already supported model selection via settings

Phase 3: Parallel Contextual Embeddings
- Replace sequential sub-batch processing with asyncio.gather
- Use Semaphore to limit concurrent API calls to MAX_WORKERS setting
- The MAX_WORKERS slider in UI now actually controls parallelization
- Expected 2-5x speedup for contextual embedding generation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Label: "Max" → "Workers" (clearer)
- Added recommendation: 3-5 for cloud APIs, up to 10 for local Ollama
- Explains that higher values may hit rate limits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/src/server/services/storage/document_storage_service.py (1)

147-155: credential_service is used before it's imported - causes NameError.

The credential_service is first referenced at line 148, but it's not imported until line 347 (from ..credential_service import credential_service). This will cause a NameError at runtime when USE_CONTEXTUAL_EMBEDDINGS lookup is attempted.

Move the import to the top of the function or add a defensive import here:

+        # Import credential_service for settings lookup
+        from ..credential_service import credential_service
+
         try:
             use_contextual_embeddings = await credential_service.get_credential(
                 "USE_CONTEXTUAL_EMBEDDINGS", "false", decrypt=True
             )
♻️ Duplicate comments (1)
archon-ui-main/src/services/credentialsService.ts (1)

69-69: The authToken field is still not persisted in setOllamaInstances.

The authToken field was added to the OllamaInstance interface, but the setOllamaInstances method at lines 507-515 does not include authToken in the fields object. When instances are saved, the auth token will be lost.

Apply this diff to persist the authToken:

        const fields: Record<string, any> = {
          name: instance.name,
          baseUrl: instance.baseUrl,
          isEnabled: instance.isEnabled,
          isPrimary: instance.isPrimary,
          instanceType: instance.instanceType || 'both',
          loadBalancingWeight: instance.loadBalancingWeight || 100
        };
+        
+        if (instance.authToken) {
+          fields.authToken = instance.authToken;
+        }
        
        // Add optional health-related fields
🧹 Nitpick comments (4)
python/src/server/services/storage/document_storage_service.py (3)

23-23: Type hint should be int | None to match the default value.

The parameter accepts None as a default but the type hint only specifies int. This could cause MyPy errors.

-    batch_size: int = None,  # Will load from settings
+    batch_size: int | None = None,  # Will load from settings

271-273: Remove unused variable chunk_idx.

The variable chunk_idx is initialized but never used in the subsequent loop. This appears to be leftover from a previous implementation.

                     # Flatten results and track successes
                     contextual_contents = []
                     successful_count = 0
-                    chunk_idx = 0

533-536: Consider using debug level for DEBUG-prefixed log messages.

These log statements are prefixed with "DEBUG" but use search_logger.info(). If these are for troubleshooting, consider using search_logger.debug() to avoid noise in production logs, or remove the DEBUG prefix if they should remain as info.

-                search_logger.info(
+                search_logger.debug(
                     f"DEBUG document_storage sending final 100% | total_batches={total_batches} | "
                     f"chunks_stored={total_chunks_stored} | contents_len={len(contents)}"
                 )
-                search_logger.info("DEBUG document_storage final 100% sent successfully")
+                search_logger.debug("DEBUG document_storage final 100% sent successfully")
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

642-693: Consider memoizing loadAvailableModels with useCallback.

The loadAvailableModels function is defined inline and called from multiple useEffect hooks but isn't memoized. While the current implementation works because the function only depends on its parameters, wrapping it in useCallback would be more idiomatic and prevent potential issues if the function's dependencies change in the future.

-  const loadAvailableModels = async (url: string, context: 'chat' | 'embedding') => {
+  const loadAvailableModels = useCallback(async (url: string, context: 'chat' | 'embedding') => {
     try {
       // ... implementation
     } catch (error: any) {
       console.warn(`Failed to load models from ${url}: ${error.message}`);
     }
-  };
+  }, []);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b6e532 and 7d44a6f.

📒 Files selected for processing (3)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (21 hunks)
  • archon-ui-main/src/services/credentialsService.ts (4 hunks)
  • python/src/server/services/storage/document_storage_service.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
python/src/server/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/server/**/*.py: Fail fast and loud for service startup failures, missing configuration, database connection failures, authentication/authorization failures, data corruption, critical dependency unavailability, and invalid data that would corrupt state
Never accept corrupted data - skip failed items entirely rather than storing corrupted or incomplete data

Files:

  • python/src/server/services/storage/document_storage_service.py
python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

  • python/src/server/services/storage/document_storage_service.py
python/src/server/services/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

python/src/server/services/**/*.py: For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing
Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations

Files:

  • python/src/server/services/storage/document_storage_service.py
python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

  • python/src/server/services/storage/document_storage_service.py
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (2)
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to python/src/server/services/**/*.py : For batch processing and background tasks, complete what you can and report detailed failures for each item rather than crashing

Applied to files:

  • python/src/server/services/storage/document_storage_service.py
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (2)
python/src/server/services/storage/document_storage_service.py (2)
python/tests/progress_tracking/integration/test_document_storage_progress.py (1)
  • cancellation_check (198-202)
python/src/server/services/embeddings/contextual_embedding_service.py (1)
  • generate_contextual_embeddings_batch (166-272)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (12)
python/src/server/services/storage/document_storage_service.py (2)

240-294: Parallel sub-batch processing implementation looks solid.

The semaphore-based concurrency limiting, graceful degradation on sub-batch failures, and proper result flattening are well-implemented. The use of return_exceptions=True ensures all sub-batches complete even when some fail.


373-389: Good use of defaultdict(deque) for handling duplicate text mapping.

The positions-by-text structure correctly handles the case where the same text appears multiple times, ensuring each embedding is mapped back to the correct original index. Using popleft() maintains order consistency.

archon-ui-main/src/services/credentialsService.ts (3)

19-27: LGTM - Interface additions are appropriate.

The new optional fields (RERANKING_MODEL, CHAT_MODEL, OLLAMA_API_MODE) align with the per-instance authentication and API mode configuration features being added across the PR.


203-209: LGTM - Default values are sensible.

The defaults for CHAT_MODEL (empty string) and OLLAMA_API_MODE ("native") align with the expected behavior where users must explicitly select a chat model and native Ollama API is the standard mode.


231-247: LGTM - String field mappings correctly updated.

The new fields CHAT_MODEL and OLLAMA_API_MODE are properly added to the string fields array for correct type coercion during credential mapping.

archon-ui-main/src/components/settings/RAGSettings.tsx (7)

211-227: LGTM - State initialization for authentication is well-structured.

The instance config objects properly include useAuth and authToken fields, and the refs correctly track authToken for change detection to prevent infinite update loops.


1603-1660: LGTM - API Mode selection UI is well-implemented.

The radio-button style selection for Native vs OpenAI-Compatible API modes is clear, with appropriate visual feedback and helpful descriptions explaining the endpoint differences. The default fallback to 'native' is consistent with the service defaults.


1853-1880: LGTM - Authentication status display is clear and informative.

The summary table correctly shows whether authentication is configured for each instance type, with appropriate visual indicators (lock icon for configured tokens).


2393-2416: LGTM - Authentication settings UI in modals is well-implemented.

The conditional token input that appears only when "Use Authentication" is checked provides good UX. The password input type appropriately masks the token.


788-811: LGTM - Delete handlers properly reset authentication state.

The LLM instance deletion correctly resets both local state (useAuth, authToken) and the ragSettings auth token field.


2091-2129: LGTM - Reranking model selection provides good options.

The conditional display of the model selector when reranking is enabled is clean UX. The model options (MiniLM variants and BGE rerankers) are appropriate choices for cross-encoder reranking with clear performance trade-off descriptions.


2376-2391: LGTM - Sync function properly propagates authentication settings.

The syncEmbeddingFromLLM function correctly copies all relevant fields including authentication settings when the user chooses to use the same host for embedding.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

142-177: Add missing authentication and API mode fields to interface.

The interface is still missing OLLAMA_API_MODE, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN. These fields are accessed via type casts (ragSettings as any) throughout the file (lines 232, 250, 255, 273, 803, 828, 1611, 2435, 2525), bypassing TypeScript's type safety.

As per coding guidelines, use TypeScript in strict mode with no implicit any. Add the missing fields:

 interface RAGSettingsProps {
   ragSettings: {
     MODEL_CHOICE: string;
     CHAT_MODEL?: string;
     USE_CONTEXTUAL_EMBEDDINGS: boolean;
     CONTEXTUAL_EMBEDDINGS_MAX_WORKERS: number;
     USE_HYBRID_SEARCH: boolean;
     USE_AGENTIC_RAG: boolean;
     USE_RERANKING: boolean;
     RERANKING_MODEL?: string;
     LLM_PROVIDER?: string;
     LLM_BASE_URL?: string;
     LLM_INSTANCE_NAME?: string;
+    OLLAMA_API_MODE?: string;
+    OLLAMA_CHAT_AUTH_TOKEN?: string;
+    OLLAMA_EMBEDDING_AUTH_TOKEN?: string;
     EMBEDDING_MODEL?: string;
     EMBEDDING_PROVIDER?: string;
     OLLAMA_EMBEDDING_URL?: string;
     OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
     // ... rest of fields
   };
   setRagSettings: (settings: any) => void;
 }

Then remove the (ragSettings as any) casts and access the fields with proper typing.

🧹 Nitpick comments (2)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)

2094-2133: Reranking model selection implementation is solid.

The reranking UI is well-implemented with a good selection of models covering different performance/quality tradeoffs. The default model choice (MiniLM-L6) is appropriate for most use cases.

Optional: Consider enhancing the help text to provide more concrete guidance on model selection:

                 <p className="text-xs text-gray-500 dark:text-gray-400 mt-1">
-                  Larger models = better results but slower
+                  MiniLM-L6: ~50ms, MiniLM-L12: ~80ms, BGE-Large: ~150ms (approximate per-document latency)
                 </p>

2049-2049: Consider a more descriptive label for the workers input.

The label "Workers" is functional but could be more self-explanatory for users unfamiliar with the concept.

Apply this diff to improve clarity:

                   <label className="text-xs text-gray-500 dark:text-gray-400 mt-1">
-                    Workers
+                    API Workers
                   </label>

Alternatively, consider "Concurrent Requests" or "Parallel Workers" to better convey the parallelization concept.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d44a6f and 43036aa.

📒 Files selected for processing (1)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (22 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (1)
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (5)
archon-ui-main/src/components/settings/RAGSettings.tsx (5)

753-756: LGTM: Model loading after health check.

The addition of loadAvailableModels after successful health checks is well-implemented. It ensures models are only fetched when the instance is confirmed healthy, and the conditional check on context prevents unnecessary loads.


1603-1660: LGTM: API mode selection UI.

The Ollama API mode selection UI is well-designed with clear visual feedback and helpful descriptions. The default handling for undefined or 'native' mode is appropriate.

Note: Ensure OLLAMA_API_MODE is added to the RAGSettingsProps interface as flagged in the previous comment to maintain type safety.


1853-1880: LGTM: Authentication status display.

The authentication status display in the configuration summary is clear and informative, with appropriate visual indicators (lock icon) and context-aware rendering for both LLM and Embedding instances.


2385-2387: LGTM: Proper use of syncEmbeddingFromLLM.

The call to syncEmbeddingFromLLM correctly propagates the LLM instance configuration (including authentication settings) to the embedding instance when users choose to use the same host. This matches the function's intended behavior as documented in the utility module.


642-680: Consider passing authentication token in model loading requests.

The loadAvailableModels function fetches models from Ollama instances but doesn't include authentication tokens in the request. If an instance requires authentication, the model loading will fail silently. Verify whether the backend /api/ollama/models endpoint requires authentication headers and whether the frontend should pass tokens for authenticated instances.

Milofax and others added 4 commits December 3, 2025 15:20
The existing spinner was already properly designed.
No UI changes needed - backend now uses the existing setting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive E2E tests for the RAG workflow:
- Complete flow test: clean slate, upload PDF, search, verify results
- Tests for handling empty search results
- Knowledge Base UI verification tests

Also adds:
- Playwright configuration
- npm scripts: test:e2e, test:e2e:ui, test:e2e:headed
- Playwright ignores in .gitignore

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Use fileURLToPath and import.meta.url for ESM-compatible __dirname.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add headless: true to playwright config
- Replace UI-based upload tests with direct API tests
- Tests now verify RAG features without modifying settings
- Check reranking metadata, hybrid search, code examples endpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)

142-177: [Duplicate] Auth token fields still missing from interface.

This issue was previously flagged but not yet addressed. The interface is missing OLLAMA_API_MODE, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN fields, forcing type casts at multiple locations (lines 232, 250, 255, 273, 803, 828, 2432, 2522).

Based on coding guidelines, TypeScript strict mode should be used with no implicit any. Add the missing fields to restore type safety.


2393-2416: [Duplicate] Authentication UI is duplicated between modals.

This duplication was previously flagged. The authentication UI block (checkbox + conditional password input) appears identically in both LLM and Embedding modals, differing only in accent color and state handlers.

As previously suggested, extract into a reusable AuthenticationInput component or use the Radix UI primitives from src/features/ui/primitives/ per project learnings.

Based on learnings, prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components.

Also applies to: 2483-2506

🧹 Nitpick comments (4)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

682-693: Consider debouncing auto-load to avoid excessive API calls.

The effects trigger loadAvailableModels on every URL change, which could cause excessive API calls while users are typing URLs.

Consider adding a debounce (e.g., 500ms) or moving the auto-load to only trigger after successful connection tests.

archon-ui-main/playwright.config.ts (1)

1-63: Consider making headless mode configurable for local debugging.

The config sets headless: true unconditionally (line 36), which may hinder local debugging. Consider using an environment variable:

-    headless: true,
+    headless: process.env.CI ? true : false,

This allows developers to see the browser during local test development while keeping CI runs headless.

archon-ui-main/tests/e2e/rag-search-flow.spec.ts (2)

20-32: Consider more specific selector for Knowledge button.

The selector button:has-text("Knowledge") with .first() might match other buttons containing "Knowledge" text. Consider a more specific selector like:

const addButton = page.locator('button:has-text("+ Knowledge")');

This ensures you're testing the correct button without relying on DOM order.


36-36: Consider extracting API base URL to configuration.

The tests hardcode http://localhost:8181 in multiple places. Consider extracting to a constant or reading from environment:

const API_BASE_URL = process.env.API_URL || 'http://localhost:8181';

Then use:

await request.post(`${API_BASE_URL}/api/rag/query`, {

This improves maintainability and enables testing against different environments.

Also applies to: 51-51, 73-73, 88-88, 104-104

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43036aa and d82c9ba.

📒 Files selected for processing (5)
  • archon-ui-main/.gitignore (1 hunks)
  • archon-ui-main/package.json (2 hunks)
  • archon-ui-main/playwright.config.ts (1 hunks)
  • archon-ui-main/src/components/settings/RAGSettings.tsx (21 hunks)
  • archon-ui-main/tests/e2e/rag-search-flow.spec.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • archon-ui-main/.gitignore
  • archon-ui-main/package.json
🧰 Additional context used
📓 Path-based instructions (3)
archon-ui-main/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

archon-ui-main/src/**/*.{ts,tsx}: Use TypeScript in strict mode with no implicit any in frontend code
Do not use dynamic Tailwind class construction in frontend components

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/!(features)/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with standard React rules for legacy code in the frontend

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
archon-ui-main/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧠 Learnings (10)
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/**/*.{test,spec}.{ts,tsx} : Use Vitest with React Testing Library for frontend testing

Applied to files:

  • archon-ui-main/tests/e2e/rag-search-flow.spec.ts
  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/**/*.test.{ts,tsx} : Use Vitest with React Testing Library for frontend testing

Applied to files:

  • archon-ui-main/tests/e2e/rag-search-flow.spec.ts
  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Create new UI components in the features directory with types in src/features/[feature]/types/, components in src/features/[feature]/components/, and use TanStack Query hooks

Applied to files:

  • archon-ui-main/tests/e2e/rag-search-flow.spec.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : TypeScript strict mode enabled with no implicit any, and use Biome with 120 character lines and double quotes in /src/features directory

Applied to files:

  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-09-04T16:30:05.227Z
Learnt from: stevepresley
Repo: coleam00/Archon PR: 573
File: archon-ui-main/src/config/api.ts:15-25
Timestamp: 2025-09-04T16:30:05.227Z
Learning: Archon UI API config: Prefer lazy getters getApiFullUrl() and getWsUrl() over module-load constants to avoid SSR/test crashes. Avoid CommonJS exports patterns (Object.defineProperty(exports,…)) in ESM. Add typeof window guards with VITE_API_URL fallback inside getApiUrl()/getWebSocketUrl() when SSR safety is required.

Applied to files:

  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/*.{ts,tsx} : Use Biome for code formatting in /src/features with 120 character lines, double quotes, and trailing commas

Applied to files:

  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/!(features)/**/*.{ts,tsx} : Use ESLint with standard React rules for legacy code in the frontend

Applied to files:

  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:25:23.987Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-29T19:25:23.987Z
Learning: Applies to archon-ui-main/src/**/*.{ts,tsx,js,jsx} : Use ESLint for legacy frontend code outside /src/features directory with standard React rules

Applied to files:

  • archon-ui-main/playwright.config.ts
📚 Learning: 2025-11-29T19:24:36.223Z
Learnt from: CR
Repo: coleam00/Archon PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:36.223Z
Learning: Applies to archon-ui-main/src/features/**/components/**/*.{ts,tsx} : Prefer using Radix UI primitives from src/features/ui/primitives/ when creating UI components

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
📚 Learning: 2025-09-25T16:36:48.527Z
Learnt from: Chillbruhhh
Repo: coleam00/Archon PR: 736
File: archon-ui-main/src/components/settings/RAGSettings.tsx:17-19
Timestamp: 2025-09-25T16:36:48.527Z
Learning: OpenRouter does not provide embedding models - it only acts as a proxy/router for various LLM providers for chat completions, not embeddings. The EMBEDDING_CAPABLE_PROVIDERS array in RAGSettings.tsx correctly excludes 'openrouter'.

Applied to files:

  • archon-ui-main/src/components/settings/RAGSettings.tsx
🧬 Code graph analysis (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (2)
archon-ui-main/src/components/settings/utils/instanceConfigSync.ts (1)
  • syncEmbeddingFromLLM (18-28)
archon-ui-main/src/features/ui/primitives/input.tsx (1)
  • Input (8-29)
🔇 Additional comments (15)
archon-ui-main/src/components/settings/RAGSettings.tsx (13)

210-222: LGTM! Auth state initialization is correct.

The authentication fields (useAuth and authToken) are properly initialized with safe defaults (disabled auth, empty token).


226-273: Auth token sync effects are correctly implemented.

The effects properly:

  • Track auth tokens via refs to prevent infinite loops
  • Derive useAuth from token presence
  • Include auth tokens in dependency arrays

The type safety issues with (ragSettings as any) casts have been flagged separately.


573-585: LGTM! Enhanced metrics state to track available models.

The addition of llmAvailableModels and embeddingAvailableModels arrays properly extends the metrics tracking to store actual model lists alongside counts.


642-680: Well-designed model loading function.

The loadAvailableModels function correctly:

  • Uses context parameter to differentiate chat vs embedding models
  • Leverages backend proxy to avoid CORS
  • Handles errors gracefully without blocking UI

753-756: LGTM! Model loading after health check is a good enhancement.

Loading available models after a successful connection test ensures the UI has fresh model data when the instance is confirmed online.


788-836: Deletion handlers correctly reset authentication state.

Both handlers properly:

  • Reset useAuth and authToken to safe defaults
  • Clear auth tokens from ragSettings
  • Maintain consistency between local state and persisted settings

892-910: LGTM! Metrics update preserves existing state.

The update correctly uses the spread pattern to preserve llmAvailableModels and embeddingAvailableModels, which are managed separately by loadAvailableModels.


2091-2130: LGTM! Reranking model selection properly implemented.

The UI correctly:

  • Shows model dropdown only when reranking is enabled
  • Provides four model options matching PR objectives
  • Uses sensible default (MiniLM-L6 - fast)
  • Includes helpful descriptions for users

1603-1660: LGTM! API mode selection UI is clear and well-designed.

The interface clearly differentiates between Native Ollama API and OpenAI-Compatible modes with helpful endpoint information. The radio-button-style UI is intuitive.


1853-1880: LGTM! Authentication status display is clear and informative.

The table row effectively communicates authentication status with:

  • Lock icon for secured instances
  • Clear "Token configured" vs "No authentication" messages
  • Proper conditional rendering based on token presence

2382-2383: LGTM! Sync utility correctly used for single-host setup.

The syncEmbeddingFromLLM call properly synchronizes all LLM instance configuration (URL, name, auth) to the embedding instance when users enable the convenience checkbox.


2428-2456: Save handlers correctly persist authentication state.

Both handlers properly:

  • Conditionally save auth tokens based on useAuth flag
  • Clear tokens with empty string when auth is disabled
  • Auto-test connection after saving (good UX)
  • Refresh metrics after successful save

Also applies to: 2518-2546


1897-1906: LGTM! Model display in summary table is clear.

The selected model row properly displays the current model with a graceful "Not selected" fallback for unconfigured instances.

archon-ui-main/playwright.config.ts (1)

49-54: LGTM! Playwright configuration is well-structured.

The configuration provides sensible defaults:

  • Appropriate timeouts (60s tests, 10s assertions)
  • CI-specific optimizations (sequential execution, retries)
  • Efficient artifact collection (traces/videos on retries only)
archon-ui-main/tests/e2e/rag-search-flow.spec.ts (1)

34-112: LGTM! API tests are well-structured and focused.

The tests correctly:

  • Test API endpoints directly without UI interaction
  • Verify response structure without brittle content assertions
  • Use conditional checks for reranking metadata (line 66)
  • Maintain independence from each other

- Replace rstrip with removesuffix for URL normalization (embedding_service.py)
- Fix auth token persistence: persist and restore authToken in setOllamaInstances/getOllamaInstances (credentialsService.ts)
- Fix auth token propagation: pass auth_token through _enrich_model_capabilities and _get_model_details (model_discovery_service.py)
- Add return_exceptions=True for batch resilience in asyncio.gather; report partial failures gracefully (embedding_service.py)
- Fix code-block detection to check full text from position 0 so cross-chunk fences are honored (base_storage_service.py)
- Fix temp file leak in PDF extraction: ensure cleanup runs even if write fails (document_processing.py)
- Remove hardcoded absolute machine-specific paths from QUICKSTART.md
- Add missing OLLAMA_CHAT_AUTH_TOKEN and OLLAMA_EMBEDDING_AUTH_TOKEN to RAGSettingsProps interface and remove unsafe type casts (RAGSettings.tsx)
- Extract duplicated authentication UI into reusable AuthenticationInput component (RAGSettings.tsx)
- Replace arbitrary waitForTimeout calls with deterministic locator-based waits in E2E tests (ollama-api-mode.spec.ts)
- Add missing OCR dependencies (pymupdf4llm, pytesseract, pdf2image) to pyproject.toml `all` group

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
archon-ui-main/src/components/settings/RAGSettings.tsx (1)

2465-2486: ⚠️ Potential issue | 🟠 Major

These post-edit connection checks still run with the old persisted token.

The modal “Save Changes” handlers only update local React state, then immediately call the backend health/model routes. Those routes look up auth from stored RAG credentials, so a newly entered Ollama token is not used until the top-level “Save Settings” persists it. Protected instances will therefore look offline or empty right after editing. Either persist first or pass the unsaved token through the test request.

Also applies to: 2540-2561

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 2465 -
2486, The post-edit health/model checks use persisted credentials, so when
editing a protected Ollama instance the new token isn't used; update the
handlers in the LLM edit flow (where ragSettings is updated via setRagSettings
and setShowEditLLMModal is closed) to either persist the new token before
calling manualTestConnection/fetchOllamaMetrics or explicitly pass the unsaved
token from llmInstanceConfig into those calls; specifically modify the
manualTestConnection invocation and any fetchOllamaMetrics call to accept and
use llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' (or call the
same persistence routine used by the top-level Save Settings) so
setOllamaManualConfirmed and setOllamaServerStatus reflect the updated
credentials immediately.
🧹 Nitpick comments (1)
python/src/server/services/storage/base_storage_service.py (1)

81-83: Avoid rescanning the full document prefix for each breakpoint check.

text[:abs_pos].count("```") copies/scans from byte 0 on every probe, so smart_chunk_text becomes quadratic as documents grow. Since this runs on the ingestion path, large PDFs can spend a lot of CPU here. Precompute fence offsets/parity once, or carry the fence state forward between chunks, instead of recounting from the start each time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/services/storage/base_storage_service.py` around lines 81 -
83, The current is_inside_code_block(text_segment: str, local_pos: int)
implementation repeatedly scans text[:abs_pos].count("```"), making
smart_chunk_text quadratic; instead, precompute fence positions or a parity map
once and consult it per probe: e.g., compute all fence_offsets = [indices of
"```"] or build a prefix boolean/bitset indicating whether each index is inside
a fence, then change is_inside_code_block to compute abs_pos = start + local_pos
and determine parity by binary-searching fence_offsets or looking up the
precomputed parity rather than rescanning the prefix; alternatively, if
smart_chunk_text iterates sequentially, carry a fence_state boolean forward and
update it when crossing a fence so is_inside_code_block becomes O(1) per check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 285-293: The code paths that update lastLLMConfigRef and call
setLLMInstanceConfig are incorrectly preserving the previous auth token by using
`authToken: newAuthToken || prev.authToken`; change these updates (in the block
around lastLLMConfigRef.current and the similar block at the other occurrence)
to assign `authToken` directly from `newAuthToken` (i.e., `authToken:
newAuthToken`) so clearing the token (`""`) actually clears the UI state and
prevents accidentally re-persisting the old secret; update the corresponding
`useAuth` calculation to still be `!!newAuthToken`.

In `@archon-ui-main/src/services/credentialsService.ts`:
- Around line 19-27: getRagSettings() is only rehydrating keys that exist in the
settings default object and the string-field whitelist, so newly added RAG keys
(RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN) get
persisted by the UI but dropped on reload; update the settings default object
and the string-field whitelist used by getRagSettings() (and the equivalent
whitelists referenced around getRagSettings() usage at the other similar blocks)
to include RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and
OLLAMA_EMBEDDING_AUTH_TOKEN so those keys are recognized during rehydration and
survive reloads.

In `@python/pyproject.toml`:
- Around line 51-53: The new Python deps ("pdf2image", "pytesseract") require
system binaries that are not installed by python/Dockerfile.server; update
python/Dockerfile.server to install the required OS packages (e.g.,
poppler-utils and tesseract-ocr, and optionally libtesseract-dev) via apt-get
(RUN apt-get update && apt-get install -y poppler-utils tesseract-ocr) so
pdf2image and pytesseract work inside the container, then rebuild the server
image.

In `@python/src/server/services/embeddings/embedding_service.py`:
- Around line 130-160: The current loop in create_embeddings_batch (using
results, embeddings, errors) collapses out failed items and returns a shorter
embeddings list which misaligns with input batch indices; instead preserve
positional mapping by returning a list whose length equals len(results) where
each index contains the embedding vector or None (or an explicit per-index
success/failure marker), keep collecting errors in errors as you already do, and
keep the existing "all failed" EmbeddingAPIError behavior when embeddings are
all missing; update the return to deliver this per-index list (and/or a (list,
errors) tuple) so callers can map embeddings back to the original batch
positions without corrupting data.

In `@python/src/server/services/ollama/model_discovery_service.py`:
- Around line 122-129: The discovery and health caches are currently keyed only
by instance_url and therefore can return anonymous/truncated results for
authenticated requests; update discover_models (and the health-check methods
referenced) to include the auth context in the cache key (e.g., (instance_url,
auth_token or normalized auth_context)) or skip caching when auth_token is
present. Locate the cache lookup/insert usage in discover_models and the health
cache code paths and change the key construction to incorporate the token or
other auth context (or add a short-circuit that bypasses cache when auth_token
is non-null) so authenticated and anonymous responses are never conflated.

In `@python/src/server/utils/document_processing.py`:
- Around line 325-342: The OCR branch masks runtime failures because
extract_text_with_ocr() returns None for both execution errors and true no-text
results; update extract_text_with_ocr to raise a specific exception (e.g.,
OCRRuntimeError) on OCR execution failures and return an empty string "" when
OCR ran successfully but found no text, then change the code in this block
(which calls is_ocr_available(), extract_text_with_ocr, logger, and raises
ValueError) to catch OCRRuntimeError and log/raise an OCR-specific error
including the exception details (logger.error(..., exc_info=True) or raise
RuntimeError("OCR failed: ...") ), while retaining the existing ValueError path
for the empty-string/no-text case so infrastructure failures and content issues
are distinguishable.

---

Outside diff comments:
In `@archon-ui-main/src/components/settings/RAGSettings.tsx`:
- Around line 2465-2486: The post-edit health/model checks use persisted
credentials, so when editing a protected Ollama instance the new token isn't
used; update the handlers in the LLM edit flow (where ragSettings is updated via
setRagSettings and setShowEditLLMModal is closed) to either persist the new
token before calling manualTestConnection/fetchOllamaMetrics or explicitly pass
the unsaved token from llmInstanceConfig into those calls; specifically modify
the manualTestConnection invocation and any fetchOllamaMetrics call to accept
and use llmInstanceConfig.useAuth ? llmInstanceConfig.authToken : '' (or call
the same persistence routine used by the top-level Save Settings) so
setOllamaManualConfirmed and setOllamaServerStatus reflect the updated
credentials immediately.

---

Nitpick comments:
In `@python/src/server/services/storage/base_storage_service.py`:
- Around line 81-83: The current is_inside_code_block(text_segment: str,
local_pos: int) implementation repeatedly scans text[:abs_pos].count("```"),
making smart_chunk_text quadratic; instead, precompute fence positions or a
parity map once and consult it per probe: e.g., compute all fence_offsets =
[indices of "```"] or build a prefix boolean/bitset indicating whether each
index is inside a fence, then change is_inside_code_block to compute abs_pos =
start + local_pos and determine parity by binary-searching fence_offsets or
looking up the precomputed parity rather than rescanning the prefix;
alternatively, if smart_chunk_text iterates sequentially, carry a fence_state
boolean forward and update it when crossing a fence so is_inside_code_block
becomes O(1) per check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 140a7298-9787-435a-aff3-2910f8815ccf

📥 Commits

Reviewing files that changed from the base of the PR and between d82c9ba and 0e9b7ca.

📒 Files selected for processing (9)
  • QUICKSTART.md
  • archon-ui-main/src/components/settings/RAGSettings.tsx
  • archon-ui-main/src/services/credentialsService.ts
  • archon-ui-main/tests/e2e/ollama-api-mode.spec.ts
  • python/pyproject.toml
  • python/src/server/services/embeddings/embedding_service.py
  • python/src/server/services/ollama/model_discovery_service.py
  • python/src/server/services/storage/base_storage_service.py
  • python/src/server/utils/document_processing.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • archon-ui-main/tests/e2e/ollama-api-mode.spec.ts

Comment on lines +285 to 293
if (newLLMUrl !== lastLLMConfigRef.current.url || newLLMName !== lastLLMConfigRef.current.name || newAuthToken !== lastLLMConfigRef.current.authToken) {
lastLLMConfigRef.current = { url: newLLMUrl, name: newLLMName, authToken: newAuthToken };
setLLMInstanceConfig(prev => {
const newConfig = {
url: newLLMUrl || prev.url,
name: newLLMName || prev.name
name: newLLMName || prev.name,
useAuth: !!newAuthToken,
authToken: newAuthToken || prev.authToken
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clearing the saved token does not actually clear local state.

authToken: newAuthToken || prev.authToken keeps the previous secret whenever the persisted value becomes "". After disabling auth, the UI state still holds the old token and a later save can write it back. This should assign newAuthToken directly.

Suggested fix
       setLLMInstanceConfig(prev => {
         const newConfig = {
           url: newLLMUrl || prev.url,
           name: newLLMName || prev.name,
           useAuth: !!newAuthToken,
-          authToken: newAuthToken || prev.authToken
+          authToken: newAuthToken
         };
       setEmbeddingInstanceConfig(prev => {
         const newConfig = {
           url: newEmbeddingUrl || prev.url,
           name: newEmbeddingName || prev.name,
           useAuth: !!newAuthToken,
-          authToken: newAuthToken || prev.authToken
+          authToken: newAuthToken
         };

Also applies to: 308-316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/components/settings/RAGSettings.tsx` around lines 285 -
293, The code paths that update lastLLMConfigRef and call setLLMInstanceConfig
are incorrectly preserving the previous auth token by using `authToken:
newAuthToken || prev.authToken`; change these updates (in the block around
lastLLMConfigRef.current and the similar block at the other occurrence) to
assign `authToken` directly from `newAuthToken` (i.e., `authToken:
newAuthToken`) so clearing the token (`""`) actually clears the UI state and
prevents accidentally re-persisting the old secret; update the corresponding
`useAuth` calculation to still be `!!newAuthToken`.

Comment on lines +19 to +27
RERANKING_MODEL?: string;
MODEL_CHOICE: string;
CHAT_MODEL?: string;
LLM_PROVIDER?: string;
LLM_BASE_URL?: string;
LLM_INSTANCE_NAME?: string;
OLLAMA_EMBEDDING_URL?: string;
OLLAMA_EMBEDDING_INSTANCE_NAME?: string;
OLLAMA_API_MODE?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

New RAG settings still won't survive a reload.

getRagSettings() only rehydrates keys that already exist in the settings default object and are in the string-field whitelist. Right now RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN are missing from that path, so those values can be saved by the UI but will be dropped the next time settings are loaded.

Also applies to: 196-209, 234-247

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@archon-ui-main/src/services/credentialsService.ts` around lines 19 - 27,
getRagSettings() is only rehydrating keys that exist in the settings default
object and the string-field whitelist, so newly added RAG keys (RERANKING_MODEL,
OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN) get persisted by the UI but
dropped on reload; update the settings default object and the string-field
whitelist used by getRagSettings() (and the equivalent whitelists referenced
around getRagSettings() usage at the other similar blocks) to include
RERANKING_MODEL, OLLAMA_CHAT_AUTH_TOKEN, and OLLAMA_EMBEDDING_AUTH_TOKEN so
those keys are recognized during rehydration and survive reloads.

Comment on lines +51 to +53
"pymupdf4llm>=0.2.5", # Better PDF-to-Markdown extraction
"pytesseract>=0.3.10", # OCR for scanned/image-based PDFs
"pdf2image>=1.17.0", # PDF to image conversion for OCR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OCR won't work in the server image with these deps alone.

pdf2image and pytesseract still need the system binaries from poppler-utils and tesseract-ocr. python/Dockerfile.server:1-80 currently installs neither, so this dependency change still leaves the OCR fallback broken at runtime inside the container.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/pyproject.toml` around lines 51 - 53, The new Python deps
("pdf2image", "pytesseract") require system binaries that are not installed by
python/Dockerfile.server; update python/Dockerfile.server to install the
required OS packages (e.g., poppler-utils and tesseract-ocr, and optionally
libtesseract-dev) via apt-get (RUN apt-get update && apt-get install -y
poppler-utils tesseract-ocr) so pdf2image and pytesseract work inside the
container, then rebuild the server image.

Comment on lines +130 to +160
# Separate successes from failures for partial-failure resilience
embeddings: list[list[float]] = []
errors: list[tuple[int, BaseException]] = []
for i, result in enumerate(results):
if isinstance(result, BaseException):
errors.append((i, result))
search_logger.error(
f"Embedding failed for text[{i}]: {result}", exc_info=False
)
else:
embeddings.append(result)

if errors and not embeddings:
# All requests failed – raise using the first error
first_error = errors[0][1]
if isinstance(first_error, httpx.HTTPStatusError):
error_content = first_error.response.text
raise EmbeddingAPIError(
f"Ollama native API error: {first_error.response.status_code} - {error_content}",
original_error=first_error,
) from first_error
raise EmbeddingAPIError(
f"Ollama native API error: {str(first_error)}", original_error=first_error
) from first_error

if errors:
search_logger.warning(
f"{len(errors)}/{len(results)} embedding requests failed; returning {len(embeddings)} successful embeddings"
)

return embeddings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't collapse failed items out of this return value.

create_embeddings_batch() still assumes the adapter returns one embedding per input text and zips that list back onto batch positionally. If text[0] fails and text[1] succeeds, returning only the successful vectors shifts text[1]'s embedding onto text[0], which corrupts stored data. Return per-index success/failure information instead of a shortened embedding list.

As per coding guidelines, "Never accept corrupted data - skip failed items entirely rather than storing corrupted data in batch operations."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/services/embeddings/embedding_service.py` around lines 130
- 160, The current loop in create_embeddings_batch (using results, embeddings,
errors) collapses out failed items and returns a shorter embeddings list which
misaligns with input batch indices; instead preserve positional mapping by
returning a list whose length equals len(results) where each index contains the
embedding vector or None (or an explicit per-index success/failure marker), keep
collecting errors in errors as you already do, and keep the existing "all
failed" EmbeddingAPIError behavior when embeddings are all missing; update the
return to deliver this per-index list (and/or a (list, errors) tuple) so callers
can map embeddings back to the original batch positions without corrupting data.

Comment on lines +122 to +129
async def discover_models(self, instance_url: str, fetch_details: bool = False, auth_token: str | None = None) -> list[OllamaModel]:
"""
Discover all available models from an Ollama instance.

Args:
instance_url: Base URL of the Ollama instance
fetch_details: If True, fetch comprehensive model details via /api/show
auth_token: Optional authentication token for protected instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache keys need to include auth context now.

These methods are now auth-aware, but the discovery and health caches are still keyed only by instance_url. An anonymous 401 or truncated public model list can therefore be cached and then reused for an authenticated call to the same host for the next 30s/5m. Cache by (instance_url, auth context) or bypass cache when a token is involved.

Also applies to: 974-980, 1036-1068

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/services/ollama/model_discovery_service.py` around lines
122 - 129, The discovery and health caches are currently keyed only by
instance_url and therefore can return anonymous/truncated results for
authenticated requests; update discover_models (and the health-check methods
referenced) to include the auth context in the cache key (e.g., (instance_url,
auth_token or normalized auth_context)) or skip caching when auth_token is
present. Locate the cache lookup/insert usage in discover_models and the health
cache code paths and change the key construction to incorporate the token or
other auth context (or add a short-circuit that bypasses cache when auth_token
is non-null) so authenticated and anonymous responses are never conflated.

Comment on lines +325 to +342
# Final fallback: OCR for image-based/scanned PDFs
if is_ocr_available():
logger.info("No text extracted - attempting OCR for image-based PDF")
ocr_text = extract_text_with_ocr(file_content)
if ocr_text and len(ocr_text.strip()) > 50:
logger.info(f"PDF extracted with OCR (Tesseract): {len(ocr_text)} chars")
return ocr_text
else:
raise ValueError(
"No text extracted from PDF: OCR found no readable text. "
"File may be empty or contain only images without text."
)
else:
raise ValueError(
"No text extracted from PDF: file appears to be images-only or scanned. "
"Install OCR dependencies for scanned PDF support: "
"pip install pytesseract pdf2image (and install tesseract + poppler)"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This masks OCR runtime failures as “no readable text.”

extract_text_with_ocr() returns None for most OCR execution failures as well as true no-text cases. In this branch that means missing tesseract/poppler binaries or other OCR errors get reported as “file may be empty,” which makes infrastructure problems indistinguishable from bad input. Please surface an OCR-specific failure here instead of treating every None as a content issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/src/server/utils/document_processing.py` around lines 325 - 342, The
OCR branch masks runtime failures because extract_text_with_ocr() returns None
for both execution errors and true no-text results; update extract_text_with_ocr
to raise a specific exception (e.g., OCRRuntimeError) on OCR execution failures
and return an empty string "" when OCR ran successfully but found no text, then
change the code in this block (which calls is_ocr_available(),
extract_text_with_ocr, logger, and raises ValueError) to catch OCRRuntimeError
and log/raise an OCR-specific error including the exception details
(logger.error(..., exc_info=True) or raise RuntimeError("OCR failed: ...") ),
while retaining the existing ValueError path for the empty-string/no-text case
so infrastructure failures and content issues are distinguishable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant