[Feature Request]:  Quantizing KV Caches with Polar Transformation (TurboQuant)

### Request Description

Provide an additional implementation of KV Cache compression using the article "TurboQuant: Redefining AI efficiency with extreme compression".  https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/

A simple implementation of it is tried in this Kaggle Notebook which shows a Compression ratio: 5.82x with some loss which happens in quantization.
https://www.kaggle.com/code/azhuvath/quantizing-kv-caches-with-polar-transformation

This allows users to prefer this KV cache quantization method provided they are fine with the loss.

### Feature Use Case

Reduce the memory requirement for KV Cache and improve the overall speed.

### Issue submission checklist

- [x] The feature request or improvement must be related to OpenVINO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Quantizing KV Caches with Polar Transformation (TurboQuant) #34954

Request Description

Feature Use Case

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Quantizing KV Caches with Polar Transformation (TurboQuant) #34954

Description

Request Description

Feature Use Case

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions