@torch.compile materializes broadcasted tensors, resulting in huge allocations on GPU

Hi,

after running a memory trace on the End 2 End example notebook with the techniques [here](https://docs.pytorch.org/docs/stable/torch_cuda_memory.html), I am finding that the pyTorch graph compiler is a lot less likely to eliminate broadcast tensors from being materialized by fusing expressions than TensorFlow. In other words, the size of simulation scenarios in the SLS is suffering from restrictions that I haven't observed with TensorFlow and Sionna 1.x

In that End2End downlink example, I am seeing two places in the code in this notebook that results in huge allocation spikes, using torch 2.10.0

1) src/sionna/phy/channel/utils.py line 297
h_f = h * e

This broadcast h * e expands to P times the size of the reduced product  h_f = h_f.sum(dim=-3) by being materialized on the GPU with P representing the number of paths used by the channel model (e.g. often 24 for UMa). 

2) src/sionna/phy/ofdm/precoding.py line 352
h_eff = h @ g

For num_ofdm_symbols = 14 in that End 2 End example, the input and output tensors have these shapes
```
  h shape:     (1, 210, 21, 14, 128,  1,  12) ~0.707 GiB
  g shape:     (1,   1, 21, 14, 128, 12,  10) ~0.034 GiB
  h_eff shape: (1, 210, 21, 14,  128, 1,  10) ~0.589 GiB
  h_eff shape: (1, 210,  1, 21,  10, 14, 128) ~0.589 GiB  (after permute)
```

The broadcast performed during the multiplication is fully materialized on the GPU even in code running under @torch.compile and requires 210 times the size of the input tensor g (because it's being broadcast 210 times), making a ~7GiB allocation.

Would the Sionna team consider deploying workarounds to prevent the materialization of such broadcasts, or do you consider this a problem of pyTorch which should be fixed on their end?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@torch.compile materializes broadcasted tensors, resulting in huge allocations on GPU #1149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

@torch.compile materializes broadcasted tensors, resulting in huge allocations on GPU #1149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions