Knowledge Distillation
Often involves the transfer of knowledge from larger models to smaller ones.
[2212.08410] Teaching Small Language Models to Reason from large language models.
Chain of thought prompting is only possible for large models.
Finetuning a student model on the CoT generated by a larger teacher model.
Distillation
Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441.
Language models in the loop: Incorporating prompting into weak supervision. arXiv preprint arXiv:2205.02318.
Want to reduce labeling cost? gpt-3 can help. arXiv preprint arXiv:2108.13487.
Distilling taskspecific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136.