Abstract
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.
Original language | English |
---|---|
Article number | 30 |
Journal | ACM Computing Surveys |
Volume | 56 |
Issue number | 2 |
DOIs | |
State | Published - 29 Feb 2024 |
Keywords
- Large language models
- foundational models
- generative AI
- neural networks
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science