主要贡献: 提出了基于MAP改进的GAP排序指标, 该指标可被用于graded relevance domains
对于GAP的理解有几点需要注意:
几点启发:
文献题目 | 去谷歌学术搜索 | ||||||||||
Extending Average Precision to Graded Relevance Judgments | |||||||||||
文献作者 | Stephen E. Robertson; Evangelos Kanoulas; Emine Yilmaz | ||||||||||
文献发表年限 | 2010 | ||||||||||
文献关键字 | |||||||||||
information retrieval, effectiveness metrics, average precision, graded relevance, learning to rank, GAP | |||||||||||
摘要描述 | |||||||||||
Evaluation metrics play a critical role both in the context of comparative evaluation of the performance of retrieval systems and in the context of learning-to-rank (LTR) as objective functions to be optimized. Many different evaluation metrics have been proposed in the IR literature, with average precision (AP) being the dominant one due a number of desirable properties it possesses. However, most of these measures, including average precision, do not incorporate graded relevance. In this work, we propose a new measure of retrieval effectiveness, the Graded Average Precision (GAP). GAP generalizes average precision to the case of multi-graded relevance and inherits all the desirable characteristics of AP: it has a nice probabilistic interpretation, it approximates the area under a graded precision-recall curve and it can be justified in terms of a simple but moderately plausible user model. We then evaluate GAP in terms of its informativeness and discriminative power. Finally, we show that GAP can reliably be used as an objective metric in learning to rank by illustrating that optimizing for GAP using SoftRank and LambdaRank leads to better performing ranking functions than the ones constructed by algorithms tuned to optimize for AP or NDCG even when using AP or NDCG as the test metrics. |