-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
enhancementNew feature or requestNew feature or requestfeatureNew feature requestNew feature request
Description
Request Description
Provide an additional implementation of KV Cache compression using the article "TurboQuant: Redefining AI efficiency with extreme compression". https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
A simple implementation of it is tried in this Kaggle Notebook which shows a Compression ratio: 5.82x with some loss which happens in quantization.
https://www.kaggle.com/code/azhuvath/quantizing-kv-caches-with-polar-transformation
This allows users to prefer this KV cache quantization method provided they are fine with the loss.
Feature Use Case
Reduce the memory requirement for KV Cache and improve the overall speed.
Issue submission checklist
- The feature request or improvement must be related to OpenVINO
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestfeatureNew feature requestNew feature request