Self-Instruct
- Limited seed set of manually written tasks to guide overall generation
- First Phase
- prompt model to generate instructions for new tasks
- Second Phase
- create input-output instances for the instructions that will be used for supervising the instruction tuning
- Final Phase
- heuristics to automatically filter low-quality or repeated instructions
Want to have a farily small overlap with the seed tasks.
The instruction tuned model outperforms the vanilla model by a large margin.
Human Evaluation shows broad abilities.
Post Processing
To encourage diversity, only add instruction when its ROUGE-L similarity with existing instructions is less than 0.7. Exclude instructions with specific keywords. Exclude invalid generations based on heuristics (instructions too long or too short, repetitions, etc.).