Commit c17dc33
authored
Using explicit GPU upcast for ZeRO-Offload (#6962)
Following discussion in
[PR-6670](#6670), the explict
upcast is much more efficient than implicit upcast, this PR is to
replace implicit upcast with explict one.
The results on 3B model are shown below:
| Option | BWD (ms) | Speed up |
|------------|-----|------|
| Before PR-6670 | 25603.30 | 1x |
| After PR-6670 | 1174.31 | 21.8X |
| After this PR| 309.2 | 82.8X |1 parent 8d1bc0a commit c17dc33
1 file changed
+2
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
546 | 546 | | |
547 | 547 | | |
548 | 548 | | |
549 | | - | |
550 | 549 | | |
551 | 550 | | |
552 | 551 | | |
553 | 552 | | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | 553 | | |
559 | 554 | | |
560 | 555 | | |
| |||
1510 | 1505 | | |
1511 | 1506 | | |
1512 | 1507 | | |
1513 | | - | |
1514 | 1508 | | |
1515 | | - | |
1516 | | - | |
1517 | | - | |
1518 | | - | |
1519 | | - | |
| 1509 | + | |
| 1510 | + | |
1520 | 1511 | | |
1521 | 1512 | | |
1522 | 1513 | | |
| |||
0 commit comments