Training Small Language Models with Knowledge Distillation

Official pre-trained models and baselines in