Enable bias correction in AdamW when fine-tuning BERT#1468
Enable bias correction in AdamW when fine-tuning BERT#1468leezu wants to merge 1 commit intodmlc:masterfrom
Conversation
Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020). Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020).
|
Let's try to rerun the training with the batch script here: https://github.com/dmlc/gluon-nlp/tree/master/tools/batch#squad-training Basically, we just need to run the following two for SQuAD 2.0 and 1.1 |
Codecov Report
@@ Coverage Diff @@
## master #1468 +/- ##
==========================================
- Coverage 85.86% 85.84% -0.02%
==========================================
Files 52 52
Lines 6911 6911
==========================================
- Hits 5934 5933 -1
- Misses 977 978 +1
Continue to review full report at Codecov.
|
|
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1468/bertbiascorrection/index.html |
|
|
Yes, you can later use the following script to sync up the results. After all results (part of the results) have been finished, you can parse the logs via |
Is there any known issue with Mobilebert? I |
|
Looks like an AMP issue or an operator issue causing AMP to continue decreasing the scale.. finetune_squad2.0.log |
|
Yes.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Leonard Lausen <notifications@github.com>
Sent: Friday, January 8, 2021 6:11:53 AM
To: dmlc/gluon-nlp <gluon-nlp@noreply.github.com>
Cc: Xingjian SHI <xshiab@connect.ust.hk>; Review requested <review_requested@noreply.github.com>
Subject: Re: [dmlc/gluon-nlp] Enable bias correction in AdamW when fine-tuning BERT (#1468)
Looks like an AMP issue? finetune_squad2.0.log<https://github.com/dmlc/gluon-nlp/files/5787622/finetune_squad2.0.log>
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub<#1468 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3SCBUACNO3HK254UOTSY4HCTANCNFSM4VZVBFMA>.
|
|
From the figure, I think the performance looks similar. If we choose to update the flags, we can upload the pretrained weights to S3 and also change the numbers in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering. |
sxjscience
left a comment
There was a problem hiding this comment.
LGTM in general. We will also need to update the results table.
This should improve stability.
Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).
Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020)