Skip to content

[Feature Request]: Quantizing KV Caches with Polar Transformation (TurboQuant) #34954

@azhuvath

Description

@azhuvath

Request Description

Provide an additional implementation of KV Cache compression using the article "TurboQuant: Redefining AI efficiency with extreme compression". https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

A simple implementation of it is tried in this Kaggle Notebook which shows a Compression ratio: 5.82x with some loss which happens in quantization.
https://www.kaggle.com/code/azhuvath/quantizing-kv-caches-with-polar-transformation

This allows users to prefer this KV cache quantization method provided they are fine with the loss.

Feature Use Case

Reduce the memory requirement for KV Cache and improve the overall speed.

Issue submission checklist

  • The feature request or improvement must be related to OpenVINO

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeatureNew feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions