Skip to content

[BUG][CONV] fix cp.async codegen failure for tail weight tiles #1105

@RMLYC

Description

@RMLYC

Description

Symptom / Motivation

Conv2d and Conv3d generic implicit-GEMM kernels can fail to compile with TileLang 0.1.9 for selected fp16/bf16 cases. The failure happens during CUDA codegen before numerical comparison:

tl::ptx_cp_async requires a final PTX byte width in {4, 8, 16}, but got 2

This was listed as a remaining convolution blocker in #1071.

Root Cause Analysis

The generic Conv2d/Conv3d kernels flatten convolution weights into the GEMM K dimension and load each weight tile with T.copy(weight_flat[k_iter * block_k, bx * block_n], weight_shared). When k_total = kernel_size_product * c_in is not divisible by block_k, the final K tile is partial. Under TileLang 0.1.9, that global-to-shared T.copy path can lower the tail copy to ptx_cp_async; for fp16/bf16 the final transfer can be only 2 bytes, which is illegal for cp.async.

The kernel should explicitly guard tail K and output-channel bounds and zero-fill invalid shared-memory elements instead of relying on a full-tile T.copy for the weight tile.

Related Files

  • tileops/kernels/convolution.py
  • tests/ops/test_convolution.py

Goal

Fix Conv2d/Conv3d TileLang 0.1.9 CUDA codegen failures for partial tail weight tiles and restore the skipped convolution coverage.

Plan

  1. Replace generic Conv2d/Conv3d weight-tile T.copy calls with explicit guarded shared-memory loads.
  2. Zero-fill out-of-bounds tail K and output-channel elements before GEMM.
  3. Remove the temporary Conv2d/Conv3d skips that covered this cp.async failure.
  4. Verify tests/ops/test_convolution.py on GPU.

Constraints

Keep the public Conv2d/Conv3d operator API unchanged.

Acceptance Criteria

  • Conv2d and Conv3d generic paths no longer fail codegen with the cp.async byte-width error.
  • Previously skipped Conv2d/Conv3d convolution cases run in tests/ops/test_convolution.py.
  • Modified files pass unit tests.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions