Extrinsic Evaluation
Measuring the performance on a real-world downstream task. For example letting a model generate a lot of answers and letting humans evaluate them → Human Evaluation.
Also see Intrinsic Evaluation
Measuring the performance on a real-world downstream task. For example letting a model generate a lot of answers and letting humans evaluate them → Human Evaluation.
Also see Intrinsic Evaluation