arXiv 1907.11692

RoBERTa: A Robustly Optimized BERT Pretraining Approach

By Yinhan Liu, Myle Ott, et al.

Published 2019-07-26

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the i…

View the original paper on arXiv