NB-MLM: Efficient Domain Adaptation of Masked Language Models for Sentiment Analysis
While Masked Language Models (MLM) are pre-trained on massive datasets, the additional training with the MLM objective on domain or task-specific data before fine-tuning for the final task is known to improve the final perfor- mance. This is usually referred to as the do- main or task adaptation step. However, unlike the initial pre-training, this step is performed for each domain or task individually and is still rather slow, requiring several GPU days com- pared to several GPU hours required for the final task fine-tuning.
We argue that the standard MLM objective leads to inefficiency when it is used for the adaptation step because it mostly learns to pre- dict the most frequent words, which are not necessarily related to a final task. We pro- pose a technique for more efficient adaptation that focuses on predicting words with large weights of the Naive Bayes classifier trained for the task at hand, which are likely more rel- evant than the most frequent words. The pro- posed method provides faster adaptation and better final performance for sentiment analysis compared to the standard approach.