Chuyển đến nội dung chính

Bài đăng

Đang hiển thị bài đăng từ 2020

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

Posted by Jacob Devlin and Ming-Wei Chang, Research Scientists, Google AI Language One of the biggest challenges in  natural language processing  (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or  billions , of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as  pre-training ). The pre-trained model can then be  fine-tuned  on small-data NLP tasks like  question answering  and  sentiment analysis , resulting in substantial accuracy improvements compared to training...