RoBERTa 已看paper

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Summary

主要就是用fairseq复现了一下,然后large batch training,另外用了新的数据集CC NEWS

Research Objective作者的研究目标

Our goal was to replicate, simplify, and better tune the training of BERT, as a reference point for better understanding the relative performance of all of these methods.

Problem Statement问题陈述,需要解决的问题是什么?

We find that BERT was significantly undertrained and propose an im- proved recipe for training BERT models,

Method(s)作者解决问题的方法/算法是什么?是否基于前人的方法?

(1) training the model longer, with bigger batches, over more data;
(2) removing the next sentence prediction objective;
(3) training on longer sequences;
(4) dynamically changing the masking pattern applied to the training data.
(5) We also collect a large new dataset (CC-NEWS) of comparable size to other privately used datasets, to better control for training set size effects.
(6) byte-level subword encoding

Evaluation作者如何评估自己的方法,实验的setup是什么样的,有没有问题或者可以借鉴的地方。

Conclusion作者给了哪些结论,哪些是strong conclusions, 哪些又是weak的conclusions?

NSP不太好

Notes(optional) 不符合此框架,但需要额外记录的笔记。

Reference

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- standing. In North American Association for Com- putational Linguistics (NAACL).

全部评论

相关推荐

牛客鼠:校友你这简历基本无敌了,春招刷刷题去冲大厂
点赞 评论 收藏
分享
想顺利毕业的猕猴桃在看牛客:好几个月没面试了,腾讯留面评吗
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务