此论文核心假设: The source domain and target domain are share common document feature space, while the data distributions for them are different. (据说此假设对于cross domain来说有争议?)
(1) 问题处理: source domain 辅佐 target domain 学习ranking model (source domain数据充足,target domain数据不足)
(2)辅佐策略: 一,通过两个domain内在共同特征(common feature learning approaches)学习模型; 二,根据两个domain数据分布特点(作出假设),对source domain的样本加权后用在target domain的模型学习上(instance weighting approaches)
(3)基本模型: Ranking SVM(两个有偏序关系的属性向量之间,通过向量相减的方式构造出新的特征向量和相应的label,然后利用SVM作分类)这也是本文用到L2R概念的地方
(4)本文根据(2)给出了两个具体的方法,其中feature-level方法较instance-level方法更稳定。前者主要用到了矩阵变化:原来特征向量通过线性组合构造出新的特征向量,而source domain和target domain 分享同一个转换矩阵和不同的模型参数,共同作用下就会学出target domain的模型参数;
后者通过对数据概率分布的假设以及概率形式的转换,构造出了the weight of instances in source domain. 而具体的概率值,是通过一定的启发式方法确定的。
此文献有几个点值得借鉴:
(1) 文章关于cross domain之间的概率分布差异
(2)提出两个层次的cross domain: feature-level 和 instance-level
(3)Ranking SVM算法思想(引用2000年的一篇工作)
文献题目 | 去谷歌学术搜索 | ||||||||||
Knowledge transfer for cross domain learning to rank | |||||||||||
文献作者 | Depin Chen,Yan Xiong, Jun Yan, Gui-Rong Xue, Gang Wang, Zheng Chen | ||||||||||
文献发表年限 | 2009 | ||||||||||
文献关键字 | |||||||||||
Information retrieval; Learning to rank; Knowledge transfer; Ranking SVM | |||||||||||
摘要描述 | |||||||||||
Recently, learning to rank technology is attracting increasing attention from both academia and industry in the areas of machine learning and information retrieval. A number of algorithms have been proposed to rank documents according to the user-given query using a human-labeled training dataset. A basic assumption behind general learning to rank algorithms is that the training and test data are drawn from the same data distri- bution. However, this assumption does not always hold true in real world applications. For example, it can be violated when the labeled training data become outdated or originally come from another domain different from its counterpart of test data. Such situations bring a new problem, which we define as cross domain learning to rank. In this paper, we aim at improving the learning of a ranking model in target domain by leveraging knowledge from the outdated or out-of-domain data (both are referred to as source domain data). We first give a formal definition of the cross domain learning to rank problem. Following this, two novel methods are proposed to conduct knowledge transfer at feature level and instance level, respectively. These two methods both utilize Ranking SVM as the basic learner. In the experiments, we evaluate these two methods using data from benchmark datasets for document retrieval. The results show that the feature-level transfer method performs better with steady improvements over baseline approaches across different datasets, while the instance-level transfer method comes out with varying performance depending on the dataset used. |