diff --git a/CLAUDE.md b/CLAUDE.md index 8878e2b949..076c01e71b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,3 +1,51 @@ -# TorchAO Claude Instructions +# TorchAO -Fill me in +PyTorch-native library for quantization, sparsity, and low-precision training. + +## Config Classes + +All configs inherit from `AOBaseConfig`. Defined in `torchao/quantization/quant_api.py`. Use `FqnToConfig` to apply different configs to different layers by module name. + +## Stable vs Prototype + +- **Stable** (`torchao/quantization/`, `torchao/float8/`, `torchao/sparsity/`, `torchao/optim/`): API stability guaranteed. +- **Prototype** (`torchao/prototype/`): Experimental, API may change without notice. + +See [docs/source/workflows/index.md](docs/source/workflows/index.md) for the full dtype x hardware status matrix. + +## Architecture and Contributing + +- [Quantization Overview](docs/source/contributing/quantization_overview.rst) - full stack walkthrough, tensor subclasses, quantization flows +- [Contributor Guide](docs/source/contributing/contributor_guide.rst) - how to add tensors, kernels, configs +- [Inference Workflows](docs/source/workflows/inference.md) - which config to use for which hardware +- [PT2E Quantization](docs/source/pt2e_quantization/index.rst) - PyTorch 2 Export quantization for deployment backends (X86, XPU, ExecuTorch) + +These render at https://docs.pytorch.org/ao/main/ + +## Deprecated APIs + +Do not use or recommend these: +- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed +- `autoquant()` - deleted +- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted +- `TorchAODType` - deprecated +- `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())` + +New tensor types should inherit from `TorchAOBaseTensor` in `torchao/utils.py`, not AQT. + +## Development + +```bash +# Setup +USE_CPP=0 pip install -e . --no-build-isolation # CPU-only +USE_CUDA=1 pip install -e . --no-build-isolation # With CUDA + +# Test (mirrors source structure) +pytest test/quantization/test_quant_api.py +pytest test/float8/ +pytest test/prototype/mx_formats/ +``` + +## Commit Messages + +- Do not commit without explicit request from the user diff --git a/docs/source/conf.py b/docs/source/conf.py index a8df0c44fc..be13d4f060 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -243,6 +243,9 @@ # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ["_static"] +# Files to copy to the docs root (served at docs.pytorch.org/ao/llms.txt) +html_extra_path = ["llms.txt"] + # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. diff --git a/docs/source/llms.txt b/docs/source/llms.txt new file mode 100644 index 0000000000..4553f8035c --- /dev/null +++ b/docs/source/llms.txt @@ -0,0 +1,43 @@ +# TorchAO + +> PyTorch-native library for quantization, sparsity, and low-precision training. Provides the quantize_() API with Config classes for int4/int8/float8/MX weight and activation quantization, composable with torch.compile. + +## Docs + +- [Quick Start](https://docs.pytorch.org/ao/stable/quick_start.html) +- [Workflows Matrix](https://docs.pytorch.org/ao/main/workflows.html): Status of every dtype x hardware combination +- [API Reference](https://docs.pytorch.org/ao/stable/api_reference/index.html) +- [Inference Quantization](https://docs.pytorch.org/ao/main/workflows/inference.html) +- [Float8 Training](https://docs.pytorch.org/ao/main/workflows/training.html) +- [QAT](https://docs.pytorch.org/ao/main/workflows/qat.html) +- [Quantization Overview](https://docs.pytorch.org/ao/main/contributing/quantization_overview.html): Architecture and internals +- [Contributor Guide](https://docs.pytorch.org/ao/main/contributing/contributor_guide.html): How to add tensors, kernels, configs +- [PT2E Quantization](https://docs.pytorch.org/ao/main/pt2e_quantization/index.html): PyTorch 2 Export quantization for deployment backends (X86, XPU, ExecuTorch) + +## Code + +- [quantize_() and Config classes](https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_api.py): Main entry point +- [Tensor subclasses](https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows): Int4Tensor, Int8Tensor, Float8Tensor, etc. +- [Granularity](https://github.com/pytorch/ao/blob/main/torchao/quantization/granularity.py): PerTensor, PerRow, PerGroup, PerBlock, PerToken +- [Float8 training](https://github.com/pytorch/ao/tree/main/torchao/float8): Scaled float8 training recipes +- [Sparsity](https://github.com/pytorch/ao/tree/main/torchao/sparsity): Semi-structured 2:4 sparsity +- [Quantized optimizers](https://github.com/pytorch/ao/tree/main/torchao/optim): AdamW8bit, AdamW4bit, AdamWFp8 +- [QAT](https://github.com/pytorch/ao/tree/main/torchao/quantization/qat): Quantization-aware training +- [MX formats](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats): MXFP8, MXFP4, NVFP4 (prototype) +- [MoE training](https://github.com/pytorch/ao/tree/main/torchao/prototype/moe_training): MXFP8 MoE training (prototype) + +## Deprecated APIs + +Do not use or recommend these: +- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed. New tensor types inherit from `TorchAOBaseTensor` +- `autoquant()` - deleted +- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted +- `TorchAODType` - deprecated +- `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())` + +## Optional + +- [Tutorials](https://github.com/pytorch/ao/tree/main/tutorials) +- [Benchmarks](https://github.com/pytorch/ao/tree/main/benchmarks) +- [Contributing](https://github.com/pytorch/ao/blob/main/CONTRIBUTING.md) +- [MSLK kernels](https://github.com/pytorch/MSLK): Optional accelerated kernels