Skip to content

LSTM: Move Wx matrix multiplication out of the loop in forward#187

Open
antihutka wants to merge 1 commit intojcjohnson:masterfrom
antihutka:lstm_speedup
Open

LSTM: Move Wx matrix multiplication out of the loop in forward#187
antihutka wants to merge 1 commit intojcjohnson:masterfrom
antihutka:lstm_speedup

Conversation

@antihutka
Copy link
Copy Markdown
Contributor

Move one of the addmm calls out of the loop and do it in one call across all timesteps. This should provide a significant speedup when running with small batch_size.
I was able to get 10-20% speedup with batch_size=8 when running on CPU, but I'm unable to test it on GPU at the moment.

@dgcrouse
Copy link
Copy Markdown

I can test GPU execution on CUDA this weekend, can someone check OpenCL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants