Scaling Laws for Neural Language Models

Abstract: We study empirical scaling laws for language model performance on the cross-entropy loss.

Introduction

Recent advances in deep learning suggest that model performance scales predictably with three key factors:

We trained over 100 models ranging from 1M to 1B parameters.

Our findings show power-law relationships across all three axes of scale.