This is an old bug discovered when Milinda was working on the QR App. Just documenting it now as we've starting talking about improving data partitioning.
Consider the following:
mapper = LDeviceSequenceBlocked(nblocks, placement=gpu_list)
A_blocked = mapper.partition_tensor(A)
@spawn(placement=A_blocked[i])
This will always cause an unnecessary copy of A_blocked[i] from the GPU to host when launching the task.
Maybe we need to introduce "( )" operators to distinguish when a copy is made or if it can be inferred otherwise.
This is an old bug discovered when Milinda was working on the QR App. Just documenting it now as we've starting talking about improving data partitioning.
Consider the following:
This will always cause an unnecessary copy of A_blocked[i] from the GPU to host when launching the task.
Maybe we need to introduce "( )" operators to distinguish when a copy is made or if it can be inferred otherwise.