Skip to content

Bump to 2603.15.ext.rc11 and fix route registration problem.#146

Merged
AjayThorve merged 7 commits intoNVIDIA-AI-Blueprints:developfrom
drobison00:devin_2603_13_ext_bump
Mar 16, 2026
Merged

Bump to 2603.15.ext.rc11 and fix route registration problem.#146
AjayThorve merged 7 commits intoNVIDIA-AI-Blueprints:developfrom
drobison00:devin_2603_13_ext_bump

Conversation

@drobison00
Copy link
Collaborator

Update versions and fix async issue.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 13, 2026

Greptile Summary

This PR bumps the deployment version to 2603.15.ext.rc11 and fixes the route registration bug where add_collection_routes and add_document_routes were previously async def but called without await, silently dropping all collection and document API routes. The fix converts both functions to plain def and moves their invocation to the synchronous build_app method — correctly resolving the previously-reported issue. As a side effect, the pgbouncer sidecar configuration (which had plaintext credentials and AUTH_TYPE: trust) has been removed, which is a security improvement.

Key changes and issues found:

  • Async fix confirmed correct: Route-registration helpers are now synchronous def functions called in build_app, ensuring all FastAPI route decorators execute at startup.
  • create_collection error code inconsistency: frontends/aiq_api raises HTTP 500 while src/aiq_agent/fastapi_extensions raises HTTP 400 for the same collection creation failure — clients targeting both APIs will observe different behaviour.
  • delete_files divergence in src/aiq_agent: The empty file_ids guard present in the frontends sibling was not carried over to src/aiq_agent/fastapi_extensions/routes/documents.py, and the partial-success message enrichment is also missing, leading to silent or confusing responses on edge cases.
  • Autoscaling stanzas added for both backend and frontend apps in values.yaml; the deployment.yaml template already handles autoscaling.enabled correctly by omitting the static replicas field.
  • Chart name corrected from aiq2-web to aiq-web.

Confidence Score: 3/5

  • The core async fix is correct, but two parallel route implementations have diverged on error codes and edge-case handling, which should be resolved before merging.
  • The primary bug fix (unawaited coroutines) is properly resolved. However, two new inconsistencies were introduced between the frontends/aiq_api and src/aiq_agent/fastapi_extensions route implementations: a differing HTTP status code for create_collection errors (500 vs 400) and a missing empty-file-IDs guard plus partial-success messaging in delete_files. These divergences will cause observable differences in API behaviour depending on which route set a client hits.
  • frontends/aiq_api/src/aiq_api/routes/collections.py and src/aiq_agent/fastapi_extensions/routes/documents.py need alignment with their sibling implementations.

Important Files Changed

Filename Overview
frontends/aiq_api/src/aiq_api/plugin.py Routes moved from async add_routes to sync build_app; add_collection_routes and add_document_routes are now plain def functions called correctly — the previously-reported unawaited-coroutine bug is resolved.
frontends/aiq_api/src/aiq_api/routes/collections.py Converted from async def to def (correct fix); diverges from the src/aiq_agent sibling by returning HTTP 500 instead of 400 for collection creation errors.
src/aiq_agent/fastapi_extensions/routes/collections.py Converted from async def to def; now returns HTTP 400 for create_collection errors (vs. HTTP 500 in the frontends sibling) — introduces a cross-implementation inconsistency.
src/aiq_agent/fastapi_extensions/routes/documents.py Converted from async def to def; uses aiofiles for temp file I/O; missing empty file_ids guard and partial-success message enrichment present in the frontends sibling.
deploy/helm/deployment-k8s/values.yaml Version tags bumped to 2603.15.ext.rc11; autoscaling stanzas added for backend and frontend; pgbouncer section with plaintext credentials and AUTH_TYPE: trust removed (security improvement).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[AIQAPIWorker.build_app - sync] --> B[Create APIRouter]
    B --> C[add_collection_routes - def, sync]
    B --> D[add_document_routes - def, sync]
    C --> E[Register POST /v1/collections]
    C --> F[Register GET /v1/collections]
    C --> G[Register GET /v1/collections/name]
    C --> H[Register DELETE /v1/collections/name]
    C --> I[Register GET /v1/knowledge/health]
    D --> J[Register POST /v1/collections/collection_name/documents]
    D --> K[Register GET /v1/collections/collection_name/documents]
    D --> L[Register DELETE /v1/collections/collection_name/documents]
    D --> M[Register GET /v1/documents/job_id/status]
    E & F & G & H & I & J & K & L & M --> N[app.include_router]
    N --> O[AIQAPIWorker.add_routes - async]
    O --> P[register_job_routes - await]
    P --> Q[Async Job & SSE routes registered]

    style A fill:#4CAF50,color:#fff
    style O fill:#2196F3,color:#fff
    style N fill:#9C27B0,color:#fff
Loading

Comments Outside Diff (2)

  1. frontends/aiq_api/src/aiq_api/routes/collections.py, line 62-64 (link)

    create_collection error code divergence between implementations

    This handler raises HTTP 500 for any collection creation failure, but the parallel implementation in src/aiq_agent/fastapi_extensions/routes/collections.py (line 64–66) raises HTTP 400 for the same scenario. API clients consuming both endpoints will receive different status codes for the same error condition (e.g., a duplicate collection name), making consistent error handling impossible.

    HTTP 400 (Bad Request) is more semantically correct for client-caused errors like duplicate collection names. Either align this to 400 or document the intended contract difference between the two implementations.

  2. src/aiq_agent/fastapi_extensions/routes/documents.py, line 179-185 (link)

    Missing empty file_ids guard and partial-success messaging

    The sibling frontends/aiq_api/src/aiq_api/routes/documents.py has two behaviors this handler lost in this PR:

    1. Empty file-IDs guard (lines 163–169 in the frontends version): before calling ingestor.delete_files, it checks if not request.file_ids: and returns early with a structured 200 response. Without this guard, an empty list is passed directly to the ingestor, which may raise an unhandled exception or silently succeed with confusing output.

    2. Partial-success messaging: the frontends version enriches the response dict with a human-readable message field based on how many files were deleted or failed. This version just returns the raw result, giving callers no explanation when total_deleted == 0 or some files fail.

    Consider aligning this implementation with the frontends version to avoid silent misbehavior and API inconsistency.

Last reviewed commit: c8dd140

@drobison00 drobison00 changed the title Bump to 2603.13.ext Bump to 2603.15.ext.rc11 and fix route registration problem. Mar 15, 2026
@cdgamarose-nv
Copy link
Collaborator

This isn't working for me when i run:
./scripts/start_e2e.sh --config_file configs/config_web_frag.yml or
./scripts/start_e2e.sh
I'm getting the following error.
image

Copy link
Collaborator

@AjayThorve AjayThorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AjayThorve AjayThorve merged commit 42b49ed into NVIDIA-AI-Blueprints:develop Mar 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants