-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Describe the feature you'd like to request
Summary
Add Server-Sent Events (SSE) streaming support to the Chat Bot to display LLM responses in real-time as tokens are generated, rather than waiting for the complete response. This would significantly improve user experience, especially for longer responses.
Current Limitations
User Experience Issues
- Long wait times with no feedback: Users must wait for the entire response to be generated before seeing any output
- Polling overhead: Frontend polls every 2 seconds, adding average 1-second latency to response delivery
- Timeout risks: Long responses can trigger timeouts before completion (as noted in Use streaming endpoint for text responses #41)
- Unnecessary server load: Constant polling creates repeated HTTP requests while tasks are running
Current Architecture
The Chat Bot currently uses a polling-based approach:
User sends message → TaskProcessing schedules task → Frontend polls every 2s →
Task completes → Result saved to DB → Next poll returns complete message
File references:
- Polling implementation:
src/components/ChattyLLM/ChattyLLMInputForm.vue:716-760 - Task checking:
lib/Controller/ChattyLLMController.php:680-724 - 2-second polling interval defined at
src/components/ChattyLLM/ChattyLLMInputForm.vue:753
Why Streaming is Now Feasible
Addressing Previous Concerns
In #41, streaming was marked as "technically not realistic" due to concerns about model compatibility and Nextcloud platform limitations. However, I believe these concerns can be addressed:
1. Model Compatibility
Concern: "We won't be able to use many models anymore"
Reality: Virtually all modern LLM providers support streaming:
- ✅ OpenAI API (GPT-3.5, GPT-4, GPT-4o) -
stream: trueparameter - ✅ Anthropic Claude - Server-Sent Events streaming
- ✅ Ollama (local models) - Streaming endpoint
- ✅ LocalAI - OpenAI-compatible streaming API
- ✅ Azure OpenAI - Streaming support
- ✅ Google Gemini - Streaming responses
- ✅ Hugging Face Inference API - Streaming generators
- ✅ Together AI, Groq, Perplexity - All support streaming
Providers that don't support streaming can gracefully fallback to the current polling approach.
2. Nextcloud Platform Limitations
Current constraint: The TaskProcessing framework is indeed incompatible with streaming:
ISynchronousProvider::process()is a blocking method that returns complete output- Task status is binary: "running" (HTTP 417) or "complete" (HTTP 200)
- No support for partial/chunked output delivery
$reportProgresscallback exists but is never utilized by any provider
Solution: Bypass TaskProcessing for streaming-capable providers by creating a dedicated streaming endpoint that communicates directly with LLM APIs.
Proposed Implementation
Architecture Overview
Create a parallel streaming path that coexists with the current TaskProcessing approach:
┌──────────────────────────────────────────────────────────────┐
│ Chat Message Request │
└───────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Provider Capability │
│ Detection │
└─────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Streaming Path │ │ Legacy Path │
│ (New) │ │ (Current) │
├──────────────────┤ ├──────────────────┤
│ Direct API call │ │ TaskProcessing │
│ Server-Sent │ │ Poll every 2s │
│ Events (SSE) │ │ Complete output │
│ Real-time chunks │ │ │
└──────────────────┘ └──────────────────┘
│ │
└─────────────┬─────────────┘
▼
┌──────────────────┐
│ Message saved │
│ to database │
└──────────────────┘
Implementation Steps
1. Backend: New Streaming Endpoint
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Frontend │ │ Streaming │ │ LLM Provider │
│ (EventSrc) │ │ Controller │ │ (Direct API) │
└─────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
│ 1. POST /chat/stream │ │
├────────────────────────>│ │
│ │ Save user message to DB │
│ │ │
│ 2. EventSource connect │ │
│ GET /chat/stream/{id} │ │
├────────────────────────>│ │
│ │ 3. Call LLM API (streaming)│
│ ├───────────────────────────>│
│ │ │
│ │<── chunk: "Hello"──────────┤
│<── data: "Hello"────────┤ │
│ │<── chunk: " world"─────────┤
│<── data: " world"───────┤ │
│ │<── chunk: "!"──────────────┤
│<── data: "!"────────────┤ │
│ │ │
│<── data: [DONE]─────────┤ Save complete message │
│ │ │
File: lib/Controller/ChattyLLMController.php
Add a new method:
/**
* Stream LLM response in real-time using Server-Sent Events
*
* @param int $sessionId
* @param int $messageId User's message ID
* @return Http\StreamResponse
*/
#[NoAdminRequired]
public function streamGenerate(int $sessionId, int $messageId): Response {
// 1. Validate session ownership
// 2. Get conversation history
// 3. Detect provider streaming capability
// 4. If streaming supported:
// - Set headers: Content-Type: text/event-stream
// - Call provider API with streaming enabled
// - Yield chunks as Server-Sent Events
// - Save complete message to DB when done
// 5. If streaming not supported:
// - Return error, frontend falls back to polling
}2. Provider Integration Layer
New file: lib/Service/StreamingService.php
class StreamingService {
/**
* Check if configured provider supports streaming
*/
public function providerSupportsStreaming(): bool;
/**
* Stream chat completion from provider
* Yields string chunks as they arrive
*/
public function streamChatCompletion(array $messages): \Generator;
/**
* Get provider-specific streaming configuration
*/
private function getProviderConfig(): array;
}This service would:
- Read provider settings (already configured in Assistant settings)
- Make direct HTTP requests to provider APIs with streaming enabled
- Parse streaming response format (SSE or JSON streaming)
- Yield chunks as they arrive
- Handle errors and reconnection
3. Frontend: EventSource Integration
File: src/components/ChattyLLM/ChattyLLMInputForm.vue
Replace polling with EventSource:
// New method (replaces pollGenerationTask)
async streamMessageGeneration(sessionId, messageId) {
const url = generateUrl('/apps/assistant/api/v1/chat/stream')
const params = new URLSearchParams({ sessionId, messageId })
const eventSource = new EventSource(`${url}?${params}`)
let fullMessage = ''
eventSource.onmessage = (event) => {
if (event.data === '[DONE]') {
eventSource.close()
this.loadingMessage = false
return
}
// Append chunk to display
fullMessage += event.data
this.updateStreamingMessage(fullMessage)
}
eventSource.onerror = (error) => {
eventSource.close()
// Fallback to polling if streaming fails
this.pollGenerationTask(taskId, sessionId)
}
}
// New method: Update message display in real-time
updateStreamingMessage(content) {
// Find or create placeholder message and update its content
// This provides real-time visual feedback as tokens arrive
}4. Route Configuration
File: appinfo/routes.php
Add new route:
[
'name' => 'chattyLLM#streamGenerate',
'url' => '/api/v1/chat/stream',
'verb' => 'GET',
],5. Graceful Fallback
The implementation should:
- ✅ Detect provider streaming capability at runtime
- ✅ Use streaming when available
- ✅ Automatically fallback to polling for non-streaming providers
- ✅ Maintain full backward compatibility
- ✅ No configuration required from users
Configuration
No user action required. The system should:
- Auto-detect if the configured provider supports streaming
- Use streaming endpoint if available
- Fall back to current polling if not
This could be exposed as an optional toggle in admin settings:
☑ Enable streaming responses (recommended for supported providers)
Benefits
User Experience
- Immediate feedback: Users see responses as they're generated, like ChatGPT/Claude web interfaces
- Perceived performance: Even if total time is the same, streaming feels faster
- Better for long responses: Progress is visible instead of a loading spinner
- Reduced timeouts: Streaming keeps connection alive, preventing timeout issues
Technical Benefits
- Reduced server load: One SSE connection vs. polling requests every 2 seconds
- Lower latency: No 0-2 second polling delay
- More efficient: Less HTTP overhead, fewer database queries
- Modern standard: SSE is well-supported in all modern browsers
- Progressive enhancement: Works alongside existing system
Competitive Parity
All major AI chat interfaces use streaming:
- ChatGPT web interface
- Claude web interface
- Google Gemini
- Microsoft Copilot
Users expect this behavior from AI assistants.
Backward Compatibility
✅ Fully backward compatible:
- Existing polling mechanism remains untouched
- Non-streaming providers continue to work
- Gradual rollout possible (enable per-provider)
- No database schema changes required
- No breaking API changes
Security Considerations
- Same authentication/authorization as current chat endpoints
- Session ownership validation
- Rate limiting (inherit from existing chat endpoints)
- Input sanitization (already handled)
- SSE is one-way (server→client), no additional attack surface
Open Questions for Maintainers
-
Architecture approval: Is bypassing TaskProcessing acceptable for this use case? Or would you prefer exploring
$reportProgresscallback implementation in TaskProcessing framework? -
Provider integration: Should streaming be implemented in:
- Option A: Assistant app directly (proposed above)
- Option B: Individual provider apps (e.g., integration_openai)
- Option C: Both with an interface/contract
-
Feature flag: Should this be:
- Auto-enabled when provider supports it
- Opt-in via admin setting
- Opt-in per user
-
Scope: Should we also add streaming for:
- Title generation?
- Other TaskProcessing task types?
-
Testing: What providers should be tested in CI/CD?
Alternative Considered: WebSockets
WebSockets would also enable streaming but:
- ❌ More complex (bi-directional when we only need server→client)
- ❌ Requires more infrastructure (persistent connections, state management)
- ❌ SSE is simpler and sufficient for this use case
- ✅ SSE auto-reconnects and is easier to debug
Request for Feedback
I'd love to hear thoughts from @julien-nc, @marcelklehr, and the Nextcloud community:
- Does this architectural approach make sense?
- Are there Nextcloud platform considerations I'm missing?
- Would you accept a PR implementing this? If so, any specific requirements?
- Should this be coordinated with provider apps (e.g., integration_openai)?
Related Issues
- Use streaming endpoint for text responses #41 - Original streaming request (closed as not feasible)
- Making intuitive output display with pause functionality #150 - Related discussion mentioned in Use streaming endpoint for text responses #41
References
- Current polling implementation:
src/components/ChattyLLM/ChattyLLMInputForm.vue:716-760 - Task processing controller:
lib/Controller/ChattyLLMController.php:680-724 - TaskProcessing listener:
lib/Listener/ChattyLLMTaskListener.php:68 - OpenAI Streaming API: https://platform.openai.com/docs/api-reference/streaming
- MDN Server-Sent Events: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events
I'm willing to implement this feature if there's interest from the maintainers. Please let me know your thoughts!
Describe the solution you'd like
see above
Describe alternatives you've considered
see above