arXiv 1907.11692
RoBERTa: A Robustly Optimized BERT Pretraining Approach
By Yinhan Liu, Myle Ott, et al.
Published 2019-07-26
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the iā¦