Extrinsic Evaluation

Measuring the performance on a real-world downstream task. For example letting a model generate a lot of answers and letting humans evaluate them → Human Evaluation.

Also see Intrinsic Evaluation