[2026-03] We have released the first survey paper about Streaming LLMs/MLLMs, covering text/speech/video stream.
[2026-02] Think-as-You-See is accepted by CVPR 2026.
[2026-01] We release a paper Speak-While-Watching.
[2026-01] StreamingThinker is accepted by ICLR 2026.
[2025-05] StreamingLLM_GPE is accepted by Findings of ACL 2025.
This repository collects the works of EIT-NLP Lab on streaming LLMs/MLLMs.
- [ACL 2025 Findings] LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding.
- [ICLR 2026] StreamingThinker: Large Language Models Can Think While Reading.
- [arxiv preprint] Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models.
- [CVPR 2026] Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models.
- [arxiv preprint] From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models.
Streaming LLMs refer to large language models that support both the progressive processing of incoming information (streaming input) and the step-by-step generation of outputs (streaming output). Building upon this foundation, we further focus on scenarios where the model performs streaming input and output simultaneously. The formal definition and taxonomy of streaming LLMs/MLLMs can be found in our survey paper.
Here is an example of streaming reasoning (text-to-text streaming):

Here is an example of streaming speech recognition (speech-to-text streaming):

If you have any questions, please contact: jl-tong@sjtu.edu.cn