Abstract
Recommender systems are now popular both commercially and in the
research community, where many approaches have been suggested for
providing recommendations. In many cases a system designer that wishes
to employ a recommendation system must choose between a set of candidate
approaches. A first step towards selecting an appropriate algorithm is
to decide which properties of the application to focus upon when making
this choice. Indeed, recommendation systems have a variety of properties
that may affect user experience, such as accuracy, robustness,
scalability, and so forth. In this paper we discuss how to compare
recommenders based on a set of properties that are relevant for the
application. We focus on comparative studies, where a few algorithms are
compared using some evaluation metric, rather than absolute benchmarking
of algorithms. We describe experimental settings appropriate for making
choices between algorithms. We review three types of experiments,
starting with an offline setting, where recommendation approaches are
compared without user interaction, then reviewing user studies, where a
small group of subjects experiment with the system and report on the
experience, and finally describe large scale online experiments, where
real user populations interact with the system. In each of these cases
we describe types of questions that can be answered, and suggest
protocols for experimentation. We also discuss how to draw trustworthy
conclusions from the conducted experiments. We then review a large set
of properties, and explain how to evaluate systems given relevant
properties. We also survey a large set of evaluation metrics in the
context of the properties that they evaluate.
Original language | English |
---|---|
Title of host publication | Recommender Systems Handbook |
Editors | F. Ricci, L. Rokach, B. Shapira, P. Kantor |
Publisher | Springer New York |
Pages | 257-297 |
ISBN (Electronic) | 9780387858203 |
ISBN (Print) | 9780387858197 |
DOIs | |
State | Published - Oct 2010 |
Keywords
- Root Mean Square Error
- Recommendation System
- User Study
- Test User
- Mean Absolute Error