Which metric is commonly used to evaluate the performance of language models?
F1-score
Accuracy
Perplexity
Precision

Computational Linguistics Exercises are loading ...