How to avoid double for loop when using an update definition #8606
Unanswered
Frontier789
asked this question in
Q&A
Replies: 1 comment
-
|
Just an immediate reaction. You can avoid initializing all those zeros with f[x] = hl.undef(hl.Float(32))
f[0] = hl.f32(0)
f[r] = f[r - 1] + 1Doesn't solve your scheduling question yet. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a dynamic programming problem, where a func (
f)'s columns depend on the value of the column to the left. After computing thisf, it's used in another func,g.Here is a simplified version of the setup:
This, however produces a double for loop:
I tried different scheduling options, but none really changed anything:
I can split
fandgusingf.compute_root():However, this will be run on the GPU and in reality I have multiple update steps, therefore I'd like to avoid running so many kernels.
Ideally I would like to find a scheduling that produces the equivalent of this C code:
I.e. perfectly inline
fintog. How can I achieve that?Beta Was this translation helpful? Give feedback.
All reactions