Skip to content

@torch.compile materializes broadcasted tensors, resulting in huge allocations on GPU #1149

@cbuchner1

Description

@cbuchner1

Hi,

after running a memory trace on the End 2 End example notebook with the techniques here, I am finding that the pyTorch graph compiler is a lot less likely to eliminate broadcast tensors from being materialized by fusing expressions than TensorFlow. In other words, the size of simulation scenarios in the SLS is suffering from restrictions that I haven't observed with TensorFlow and Sionna 1.x

In that End2End downlink example, I am seeing two places in the code in this notebook that results in huge allocation spikes, using torch 2.10.0

  1. src/sionna/phy/channel/utils.py line 297
    h_f = h * e

This broadcast h * e expands to P times the size of the reduced product h_f = h_f.sum(dim=-3) by being materialized on the GPU with P representing the number of paths used by the channel model (e.g. often 24 for UMa).

  1. src/sionna/phy/ofdm/precoding.py line 352
    h_eff = h @ g

For num_ofdm_symbols = 14 in that End 2 End example, the input and output tensors have these shapes

  h shape:     (1, 210, 21, 14, 128,  1,  12) ~0.707 GiB
  g shape:     (1,   1, 21, 14, 128, 12,  10) ~0.034 GiB
  h_eff shape: (1, 210, 21, 14,  128, 1,  10) ~0.589 GiB
  h_eff shape: (1, 210,  1, 21,  10, 14, 128) ~0.589 GiB  (after permute)

The broadcast performed during the multiplication is fully materialized on the GPU even in code running under @torch.compile and requires 210 times the size of the input tensor g (because it's being broadcast 210 times), making a ~7GiB allocation.

Would the Sionna team consider deploying workarounds to prevent the materialization of such broadcasts, or do you consider this a problem of pyTorch which should be fixed on their end?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions