add torch.compile test for Float8BlockwiseLinear by iamzainhuda · Pull Request #4187 · pytorch/ao

iamzainhuda · 2026-03-26T21:15:56Z

Summary

Add torch.compile testing to Float8BlockwiseLinear forward and backward passes, verifying fullgraph compilation, numerical correctness, and no recompilation across repeated calls with the same shapes
Refactored existing test_blockwise_quant_linear_fwd_bwd into a shared _run_blockwise_quant_linear_fwd_bwd helper that supports both eager and compiled execution paths
Fixes backward pass assertions to correctly compare gradients (input grad vs input grad, weight grad vs weight grad)

Testing

pytest test/prototype/blockwise_fp8_training/test_blockwise_linear.py

Eager mode (test_blockwise_quant_linear_fwd_bwd): Parametrized across in_features=[4096], out_features=[128256], batch_size=[1, 8], block_size=[128] — validates forward SQNR >= 25.0 and backward grad SQNR >= 30.0 against a reference nn.Linear
Compile mode (test_blockwise_quant_linear_compile_fullgraph_fwd_bwd): Runs with torch.compile(fullgraph=True) using CompileCounterWithBackend("inductor") to assert:
- The model traces into a single compiled frame (no graph breaks)
- No recompilation on a second forward/backward call with the same input shapes
- Numerical correctness matches the eager reference (same SQNR thresholds)
Both paths check for NaN-free outputs and gradients

pytorch-bot · 2026-03-26T21:15:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4187

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit 3292946 with merge base f11eff8 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test (CPU 2.10, linux.4xlarge, torch==2.10.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test (CPU 2.9, linux.4xlarge, torch==2.9.1 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test (CUDA 2.10, linux.g5.12xlarge.nvidia.gpu, torch==2.10.0, cuda, 12.6) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test (CUDA 2.9, linux.g5.12xlarge.nvidia.gpu, torch==2.9.1, cuda, 12.6) / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_quantizer.py::TestQuantizePT2EX86Inductor::test_set_module_name_with_mixed_configs

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre · 2026-03-26T23:24:03Z

@iamzainhuda i think there was a miscommunication, we don't want to directly wrap an individual triton custom op in torch.compile and test - we want to compile a full blockwise linear layer (this test) and ensure there are no graph breaks (fullgraph=True) and that numerics of outputs/grads undergo the pass with the same threshold testing as the eager mode tests.

iamzainhuda · 2026-03-30T20:55:22Z

@iamzainhuda i think there was a miscommunication, we don't want to directly wrap an individual triton custom op in torch.compile and test - we want to compile a full blockwise linear layer (this test) and ensure there are no graph breaks (fullgraph=True) and that numerics of outputs/grads undergo the pass with the same threshold testing as the eager mode tests.

that makes sense (esp w.r.t understanding graph breaks), updated the PR. added a compile test in blockwise_linear with a full layer, used a smaller shape for compile time sake. verified full graph compilation, numerical correctness and no erroneous recompilation across multiple calls.

danielvegamyhre · 2026-03-30T20:59:09Z

test/prototype/blockwise_fp8_training/test_blockwise_linear.py

+        out_features,
+        batch_size,
+        block_size,
+        compile_mode=True,


can we just set this "use compile" bool as a pytest.mark.parametrize parameter, and combine the eager and compile unit tests into one test (rather than duplicating just to change this bool?). to keep things simple and be consistent with the usual parttern in other torchao tests.

or is there any reason these are separated now?

i had it separated for 1) compile test uses a smaller shape for compile time/runtime cost and 2) easier to reason about failures when a "test_linear_compile" is failing vs "test_linear_fwd_bwd" as i felt they were testing reasonably different things. so it'd make less sense to have them both as "test_linear_fwd_bwd". happy to move them together if you'd like!

ok sure, sgtm. i think the compile time thing is a bit overkill for a unit test unless it's really long, but i agree the test failures will be more readable at a glance, since with parameterized use_compile bool the failed config just shows up as like test_this_thing[4096-4096-1-128-true] which gets confusing especially if we add more bool test params later

danielvegamyhre · 2026-03-30T21:04:06Z

test/prototype/blockwise_fp8_training/test_blockwise_linear.py

+    if compile_mode:
+        with torch._dynamo.config.patch(trace_autograd_ops=True):
+            torch._dynamo.reset()
+            compiled_frame_counter = CompileCounterWithBackend("inductor")


this is cool, hadn't used this before

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2026

iamzainhuda added the module: training quantize_ api training flow label Mar 26, 2026

iamzainhuda requested a review from danielvegamyhre March 26, 2026 22:09

This comment was marked as spam.

Sign in to view

iamzainhuda added 6 commits March 30, 2026 20:18

add torch.compile to kernel unit tests

eb862d9

add torch.compule blockwise linear testing

2cea2e1

improve test to cover backward

6aa7402

check for recompilation

083e091

clean up

eea79bf

add use_compile parametrize to remaining blockwise kernel tests

8548e82

iamzainhuda force-pushed the torch-compile-kernel-tests branch from 6457882 to 8548e82 Compare March 30, 2026 20:19

iamzainhuda added 2 commits March 30, 2026 20:21

remove kernel compile changes

07e859b

remove kernel compile changes from test_blockwise_kernels.py

3292946

iamzainhuda changed the title ~~add torch.compile to blockwise quantized kernel unit tests~~ add torch.compile test for Float8BlockwiseLinear Mar 30, 2026

danielvegamyhre reviewed Mar 30, 2026

View reviewed changes

danielvegamyhre approved these changes Mar 30, 2026

View reviewed changes

iamzainhuda merged commit 3ad1067 into main Mar 31, 2026
15 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add torch.compile test for Float8BlockwiseLinear#4187

add torch.compile test for Float8BlockwiseLinear#4187
iamzainhuda merged 8 commits intomainfrom
torch-compile-kernel-tests

iamzainhuda commented Mar 26, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

danielvegamyhre commented Mar 26, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

iamzainhuda commented Mar 30, 2026

Uh oh!

danielvegamyhre Mar 30, 2026

Uh oh!

iamzainhuda Mar 30, 2026

Uh oh!

danielvegamyhre Mar 30, 2026

Uh oh!

danielvegamyhre Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iamzainhuda commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

pytorch-bot bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4187

✅ You can merge normally! (8 Unrelated Failures)

Uh oh!

danielvegamyhre commented Mar 26, 2026

Uh oh!

This comment was marked as spam.

Uh oh!

iamzainhuda commented Mar 30, 2026

Uh oh!

danielvegamyhre Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

iamzainhuda Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

danielvegamyhre Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iamzainhuda commented Mar 26, 2026 •

edited

Loading

pytorch-bot bot commented Mar 26, 2026 •

edited

Loading