Skip to content

tile-ai/TileOPs

Repository files navigation

TileOPs (TOP): Operator Library for LLMs Built on TileLang

Installation | Getting Started | Documents

TileOPs (TOP) is a high-performance operator library for large language models (LLMs) built on TileLang. It offers efficient, modular, and composable implementations for AI workloads, especially for LLMs.

⚠️ Status: TileOPs is under active and rapid development. APIs and features may change.

What TileOPs is for:

  • Out-of-the-box Operator Library: A growing collection of production-ready operators commonly used in LLM workloads, designed with clear abstractions and modular building blocks. These operators can be used directly or easily extended for custom research and system integration.
  • Efficient Attention Kernels for LLMs: Highly optimized attention implementations, including MHA/GQA (implemented FA2 on Ampere-like GPUs and FA3 on Hopper), DeepSeek-MLA, and DeepSeek-DSA.
  • Reference Implementation for TileLang: TileOPs acts as a canonical reference implementation for writing performant and maintainable kernels in TileLang. It demonstrates best practices in tiling strategies, memory hierarchy utilization, and warp-/block-level coordination, making it a practical learning resource for compiler and kernel developers.

The core features of TileOPs include:

  • Auto-Tuning: Built-in auto-tuning support to explore tile sizes, pipelines, and scheduling parameters, enabling kernels to adapt efficiently to different GPU architectures and workload characteristics with minimal manual effort.
  • CUDA-Graph and torch.compile Compatibility: TileOPs APIs are fully compatible with CUDA-Graph capture and PyTorch torch.compile, allowing seamless integration into modern training and inference pipelines with reduced launch overhead and improved end-to-end performance.
  • Lightweight Dependencies: TileOPs depends only on TileLang and PyTorch, keeping the software stack minimal and easy to integrate.

Benchmark Summary

TODO

Support Matrix

TODO

📦 Install with pip

Requirements

  • Python >= 3.8
  • Torch >= 2.1
  • TileLang >= 0.1.7

Method 1: Install from PyPI

pip install tileops

Method 2: Install from source (editable mode for development)

git clone https://github.com/tile-ai/TileOPs
cd TileOPs
pip install -e '.[dev]' -v # remove -e option if you don't want to install in editable mode, -v for verbose output

🚀 Quick Start

TODO

Documents

Hierarchical APIs

TileOPs is structured around four hierarchical key concepts, each representing a distinct level of abstraction. Higher-level components are composed from, or delegate execution to, the next lower level.

  • Layer: A high-level, user-facing abstraction analogous to torch.nn.Module. It manages stateful parameters.
  • Function: A stateless, functional abstraction analogous to torch.nn.functional. Functions are fully compatible with CUDA-Graph capture, torch.compile, and torch.autograd.
  • Op: determines the implementation for a given shape and hardware, dispatching to the correct Kernel and providing unit test and benchmark. Ops are fully compatible with CUDA-Graph capture and torch.compile.
  • Kernel: Tilelang-based kernels with hardware-specific optimizations.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published