A survey of accuracy evaluation metrics of recommendation tasks

Asela Gunawardana, Guy Shani

Research output: Contribution to journalArticlepeer-review

504 Scopus citations


Recommender systems are now popular both commercially and in the research community, where many algorithms have been suggested for providing recommendations. These algorithms typically perform differently in various domains and tasks. Therefore, it is important from the research perspective, as well as from a practical view, to be able to decide on an algorithm that matches the domain and the task of interest. The standard way to make such decisions is by comparing a number of algorithms offline using some evaluation metric. Indeed, many evaluation metrics have been suggested for comparing recommendation algorithms. The decision on the proper evaluation metric is often critical, as each metric may favor a different algorithm. In this paper we review the proper construction of offline experiments for deciding on the most appropriate algorithm. We discuss three important tasks of recommender systems, and classify a set of appropriate well known evaluation metrics for each task. We demonstrate how using an improper evaluation metric can lead to the selection of an improper algorithm for the task of interest. We also discuss other important considerations when designing offline experiments.

Original languageEnglish
Pages (from-to)2935-2962
Number of pages28
JournalJournal of Machine Learning Research
StatePublished - 1 Dec 2009


  • Collaborative filtering
  • Comparative studies
  • Recommender systems
  • Statistical analysis

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence


Dive into the research topics of 'A survey of accuracy evaluation metrics of recommendation tasks'. Together they form a unique fingerprint.

Cite this