We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike previous work, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), and SQuAD v1.1 F1 score to 93.2 (1.5% absolute improvement), outperforming the previous best systems by a large margin.
{
"id": "45bc6865-0968-4768-bbfd-992851f13658",
"title": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)",
"slug": "bert-pre-training-deep-bidirectional-transformers",
"video_url": "https://www.youtube.com/watch?v=t9P9J8b1y8c",
"url": "https://arxiv.org/abs/1810.04805",
"resource_category": "research",
"image_url": null,
"thumbnail_url": null
}