PyTorchAlpaka: Batched inference with TensorCollections, eval and frozen model, FP16 convsersion support#50498
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48664 |
|
type ngt |
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48680 Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48687 Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48691 |
|
Pull request #50498 was updated. |
3047b29 to
d8d412e
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48693 |
|
Pull request #50498 was updated. |
bccbacd to
d9f8b65
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-50498/48715 |
|
Pull request #50498 was updated. |
This PR results from a collaboration with @valsdav. Thanks also to @Electricks94, who helped implement batched inference support.
PR description:
This PR introduces some new features to the PyTorchAlpaka interface:
TensorCollectionbatches.Modelwrapper now enforces that eval and optionally frozen mode. TorchScript modules cannot be moved to another device once they have been fronzen.to_halfflag). If the model was exported in FP16 format, the whole inference will be performed in half-precision format, reducing peak GPU memory.Validation
To validate the PR, I prepared two producers to launch inference in mini-batches: SimpleNetMiniBatch and TinyResNetMiniBatch.
I then modified InspectionSink.cc to check that the relative difference between the output of the default and batch implementation is lower than a compatibility threshold (default 1e-5).
You can launch the tests by: