RoBERTa 已看paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Summary
主要就是用fairseq复现了一下,然后large batch training,另外用了新的数据集CC NEWS
Research Objective作者的研究目标
Our goal was to replicate, simplify, and better tune the training of BERT, as a reference point for better understanding the relative performance of all of these methods.
Problem Statement问题陈述,需要解决的问题是什么?
We find that BERT was significantly undertrained and propose an im- proved recipe for training BERT models,
Method(s)作者解决问题的方法/算法是什么?是否基于前人的方法?
(1) training the model longer, with bigger batches, over more data;
(2) removing the next sentence prediction objective;
(3) training on longer sequences;
(4) dynamically changing the masking pattern applied to the training data.
(5) We also collect a large new dataset (CC-NEWS) of comparable size to other privately used datasets, to better control for training set size effects.
(6) byte-level subword encoding
Evaluation作者如何评估自己的方法,实验的setup是什么样的,有没有问题或者可以借鉴的地方。
Conclusion作者给了哪些结论,哪些是strong conclusions, 哪些又是weak的conclusions?
NSP不太好
Notes(optional) 不符合此框架,但需要额外记录的笔记。
Reference
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- standing. In North American Association for Com- putational Linguistics (NAACL).