Enable bias correction in AdamW when fine-tuning BERT by leezu · Pull Request #1468 · dmlc/gluon-nlp

leezu · 2021-01-07T23:14:13Z

This should improve stability.

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).

Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020)

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020). Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020).

sxjscience · 2021-01-07T23:16:01Z

Let's try to rerun the training with the batch script here: https://github.com/dmlc/gluon-nlp/tree/master/tools/batch#squad-training

Basically, we just need to run the following two for SQuAD 2.0 and 1.1

# AWS Batch training with horovod on SQuAD 2.0 + FP16
bash question_answering/run_batch_squad.sh 1 2.0 submit_squad_v2_horovod_fp16.log float16

# AWS Batch training with horovod on SQuAD 1.1 + FP16
bash question_answering/run_batch_squad.sh 1 1.1 submit_squad_v1_horovod_fp16.log float16

codecov · 2021-01-07T23:26:33Z

Codecov Report

Merging #1468 (52ce2a4) into master (def0d70) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1468      +/-   ##
==========================================
- Coverage   85.86%   85.84%   -0.02%     
==========================================
  Files          52       52              
  Lines        6911     6911              
==========================================
- Hits         5934     5933       -1     
- Misses        977      978       +1

Impacted Files	Coverage Δ
src/gluonnlp/data/tokenizers/yttm.py	`81.89% <0.00%> (-0.87%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update def0d70...52ce2a4. Read the comment docs.

github-actions · 2021-01-07T23:36:15Z

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1468/bertbiascorrection/index.html

leezu · 2021-01-08T00:01:26Z

test_squad2_albert_base 8903644b-13e1-4aa4-b695-e7b5f2c50c7d
test_squad2_albert_large aac428ac-4e25-48e8-8f3e-2643cbb6b95e
test_squad2_albert_xlarge bb565663-8173-45aa-9489-2dd690fd24c4
test_squad2_albert_xxlarge 38d9929c-fea2-4648-bc68-0bd4eb491ee8
test_squad2_electra_base 0eb9090a-d86b-40a6-9f1c-61e1cf034b59
test_squad2_electra_large 43fabf48-b524-499f-9d8a-2113349dcf74
test_squad2_electra_small 5c631945-ad26-4c2f-a7d3-bb8c705023a2
test_squad2_roberta_large 96d1e46f-b292-4915-a867-c724bb082585
test_squad2_uncased_bert_base 8228dd4c-27d3-4118-b682-06332db980f2
test_squad2_uncased_bert_large 22a91f7c-707e-4adf-a3d9-71286a3e165e
test_squad2_gluon_en_cased_bert_base_v1 13d38ddd-4ab6-4e60-8cae-1400d3169d4c
test_squad2_mobilebert 5377ebdc-da03-4e4e-8546-43e83643d1c0
test_squad2_albert_base c71abbd1-9ddb-465a-83a8-a257994a47a4
test_squad2_albert_large 55a10c2f-b51e-4722-b8fe-d0154ccf1124
test_squad2_albert_xlarge d3b1e954-b22e-4b30-bc3a-db3303d8de85
test_squad2_albert_xxlarge 9d8c599c-ecf2-4815-ac3c-cc853c75cddd
test_squad2_electra_base 9c10fca5-0ac6-4ec8-91ce-ebf2e0593513
test_squad2_electra_large d844645c-d56b-4549-805e-a3558d777e75
test_squad2_electra_small 8b17bb3f-ee8e-4212-92d7-59155f0c54ef
test_squad2_roberta_large e9972888-ae53-41e0-9b8f-1db8359e68c9
test_squad2_uncased_bert_base 083c431c-6e02-4a67-ab92-1e84a450df52
test_squad2_uncased_bert_large 24d40d9e-06fd-4158-90a3-1ee5da7183c1
test_squad2_gluon_en_cased_bert_base_v1 6b2c015b-5829-40b6-9435-718d3ecf46de
test_squad2_mobilebert 08e7618c-7e19-4db2-9451-09f65729272e

sxjscience · 2021-01-08T00:04:39Z

Yes, you can later use the following script to sync up the results.

bash question_answering/sync_batch_result.sh submit_squad_v2_horovod_fp16.log squad_v2_horovod_fp16
bash question_answering/sync_batch_result.sh submit_squad_v1_horovod_fp16.log squad_v1_horovod_fp16

After all results (part of the results) have been finished, you can parse the logs via

python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp32

leezu · 2021-01-08T14:09:27Z

% python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp16                                                                  1m 37s ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  81.861255  79.112272       -1.671970       -1.742718             1.139900
1                  albert_large  84.904438  81.900109       -1.086745       -1.086745             3.423180
2                 albert_xlarge  88.032327  85.134338       -1.625434       -1.625434             5.967083
3                albert_xxlarge  90.085053  87.155731       -2.226489       -2.226489            11.294118
4                  electra_base  86.282903  83.643561       -1.848169       -2.301743             1.250153
5                 electra_large  90.871907  88.461215       -1.347744       -1.347744             3.140608
6                 electra_small  73.878219  71.481513       -1.548537       -1.548537             0.383728
7   gluon_en_cased_bert_base_v1  77.620289  74.757854       -1.731051       -1.731051             1.595762
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  89.239196  86.431399       -2.168329       -2.168329             4.119268
10            uncased_bert_base  75.539014  72.702771       -1.595349       -1.850638             1.540320
11           uncased_bert_large  81.322878  78.177377       -2.056313       -2.056739             4.103469
Saving to squad_v2_horovod_fp16.csv

% python3 question_answering/parse_squad_results.py --dir squad_v1_horovod_fp16                                                                         ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  90.605130  83.964049             NaN             NaN             0.745851
1                  albert_large  92.574139  86.385998             NaN             NaN             2.319241
2                 albert_xlarge  93.836504  87.984863             NaN             NaN             4.367765
3                albert_xxlarge  94.569074  88.448439             NaN             NaN             7.321531
4                  electra_base  92.483534  86.821192             NaN             NaN             0.882092
5                 electra_large  94.824761  89.631031             NaN             NaN             2.216832
6                 electra_small  85.263124  78.893094             NaN             NaN             0.267190
7   gluon_en_cased_bert_base_v1  88.685434  81.986755             NaN             NaN             1.077892
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  94.665818  89.101230             NaN             NaN             2.790591
10            uncased_bert_base  88.103126  81.201514             NaN             NaN             0.979201
11           uncased_bert_large  90.691656  83.945128             NaN             NaN             2.756076
Saving to squad_v1_horovod_fp16.csv

Is there any known issue with Mobilebert? I

leezu · 2021-01-08T14:11:37Z

Looks like an AMP issue or an operator issue causing AMP to continue decreasing the scale.. finetune_squad2.0.log

sxjscience · 2021-01-08T15:59:31Z

Yes. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Leonard Lausen <notifications@github.com> Sent: Friday, January 8, 2021 6:11:53 AM To: dmlc/gluon-nlp <gluon-nlp@noreply.github.com> Cc: Xingjian SHI <xshiab@connect.ust.hk>; Review requested <review_requested@noreply.github.com> Subject: Re: [dmlc/gluon-nlp] Enable bias correction in AdamW when fine-tuning BERT (#1468) Looks like an AMP issue? finetune_squad2.0.log<https://github.com/dmlc/gluon-nlp/files/5787622/finetune_squad2.0.log> — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub<#1468 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3SCBUACNO3HK254UOTSY4HCTANCNFSM4VZVBFMA>.

sxjscience · 2021-01-08T16:47:10Z

From the figure, I think the performance looks similar. If we choose to update the flags, we can upload the pretrained weights to S3 and also change the numbers in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering.

sxjscience

LGTM in general. We will also need to update the results table.

leezu requested a review from a team as a code owner January 7, 2021 23:14

sxjscience approved these changes Jan 10, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable bias correction in AdamW when fine-tuning BERT#1468

Enable bias correction in AdamW when fine-tuning BERT#1468
leezu wants to merge 1 commit intodmlc:masterfrom
leezu:bertbiascorrection

leezu commented Jan 7, 2021

Uh oh!

sxjscience commented Jan 7, 2021

Uh oh!

codecov bot commented Jan 7, 2021 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2021

Uh oh!

leezu commented Jan 8, 2021

Uh oh!

sxjscience commented Jan 8, 2021

Uh oh!

leezu commented Jan 8, 2021

Uh oh!

leezu commented Jan 8, 2021 •

edited

Loading

Uh oh!

sxjscience commented Jan 8, 2021 via email

Uh oh!

sxjscience commented Jan 8, 2021

Uh oh!

sxjscience left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leezu commented Jan 7, 2021

Uh oh!

sxjscience commented Jan 7, 2021

Uh oh!

codecov bot commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jan 7, 2021

Uh oh!

leezu commented Jan 8, 2021

Uh oh!

sxjscience commented Jan 8, 2021

Uh oh!

leezu commented Jan 8, 2021

Uh oh!

leezu commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sxjscience commented Jan 8, 2021 via email

Uh oh!

sxjscience commented Jan 8, 2021

Uh oh!

sxjscience left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 7, 2021 •

edited

Loading

leezu commented Jan 8, 2021 •

edited

Loading