Related work for cross-domain CF models:
(1) 通过common knowledge(user characters;item attributes[9]; semantic networks[16])连接source domain和target domain,如:
a)同构(Homogeneous )数据一般采样此类方法,异构(Heterogeneous)数据此方法难度较大,不过通过tags或者其他合适的suitable knowledge repositories【17】可以解决这个问题,(甚至是没有共同的users或者items[10:TagCDCF]。
b) 这类方法试图通过某些方法增加两个领域之间的overlapped information.
(2) 通过隐向量连接source domain和target domain[18]
a) tensor factorization algorithm (这种方法要求两个领域的用户统一Homogeneous)【6】
本文迁移思路(相对简单):
(1)users和 items部分overlap,两个评分矩阵
(2)最优化两个评分矩阵的最小评分误差,以及让user/item的隐向量尽可能的同他的近邻差别较小(隐向量之间评分误差),同时以相似度为权重,文中称之为近邻平滑。近邻中可能存在source domain中的user或者item (overlapped users/items?),由此达到迁移的目的。(其他领域样本×权重,融入本领域)
(3)最后统一优化目标(线性组合),SGD求解。
文献题目 | 去谷歌学术搜索 | ||||||||||
TLRec:Transfer Learning for Cross-domain Recommendation | |||||||||||
文献作者 | Leihui Chen, Jianbing Zheng | ||||||||||
文献发表年限 | 2017 | ||||||||||
文献关键字 | |||||||||||
transfer learning; cross-domain; partially overlapped | |||||||||||
摘要描述 | |||||||||||
In the era of big data, the available information on the Internet has overwhelmed the human processing capabilities in some commercial applications. Recommendation techniques are indispensable to predict user ratings on items in terms of historical data and deal with the information overload. In many applications, the problem of data sparsity usually results in overfitting and fails to give desirable performance. Therefore, many works have started to investigate the techniques of cross- domain recommendation to overcome the challenge. However, it is not trivial. In this paper, we propose a transfer learning algorithm, named TLRec, for cross-domain recommendation, which exploits the overlapped users and items as a bridge to link different domains and implements knowledge transfer. We learn parameters based on the defined empirical prediction error, smoothness and regularization of user and item latent vectors. We also establish a relation between TLRec and vertex vectoring on bipartite graphs. The experimental result illustrates that TLRec has promising performance and outperforms several state-of-the- art approaches on a real dataset. |