-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Labels
Description
Problem Description
When users input noise or nonsensical sounds, the large language model still responds with some text, but users don't know what the response content is. There is currently a lack of effective handling mechanisms for this situation.
Current Behavior
- Users input noise or meaningless sounds
- ASR system still attempts transcription
- LLM receives low-quality input and still generates responses
- Users cannot determine the content and accuracy of system responses
Expected Behavior
The system should be able to detect and handle low-quality or meaningless input to provide a better user experience.
Possible Solutions
1. Confidence and Rationality Detection
- ASR Level: Implement speech recognition confidence detection
- LLM Level: Implement input text rationality detection
- Prompt users to re-input when confidence is below threshold
2. Audio Quality Improvement
- Implement audio noise reduction processing
- Improve input audio clarity
- Audio quality pre-detection
3. User Feedback Mechanism
- Alert users when low-clarity audio is detected
- Provide re-recording options
- Display confidence scores for user reference
Technical Implementation Considerations
- ASR confidence threshold configuration
- Text rationality evaluation algorithm
- Audio noise reduction algorithm integration
- User interface feedback design
- Configuration options allowing users to adjust sensitivity
Priority
Medium - This feature can significantly improve user experience and reduce misunderstandings caused by input quality issues
Suggested Labels
enhancementaudioaiuser-experience
Reactions are currently unavailable