arXiv 1910.01108
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
By Victor Sanh, Lysandre Debut, et al.
Published 2019-10-02
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good perfo…