arXiv 1907.11692

RoBERTa: A Robustly Optimized BERT Pretraining Approach

By Yinhan Liu, Myle Ott, et al.

Published 2019-07-26

Citation lineage

Review the prior work and downstream research connected to this paper.

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the i…

View the original paper on arXiv