Knowledge Distillation

Often involves the transfer of knowledge from larger models to smaller ones.

Chain of thought prompting is only possible for large models.

Finetuning a student model on the CoT generated by a larger teacher model.

Distillation

Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441.

Language models in the loop: Incorporating prompting into weak supervision. arXiv preprint arXiv:2205.02318.

Want to reduce labeling cost? gpt-3 can help. arXiv preprint arXiv:2108.13487.

Distilling taskspecific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136.