理解这篇文章的核心前提是,理解矩阵分解中关于聚类表达的作用。
核心步骤:
(1)在source domain,利用正交分解(X=UBV),确定codebook矩阵S/B(本质上,B的行和列分别代表了the vector of representative users and representative items,文中的说法是:the user basis of the row space of XV and the item bases of the column space of UX,即 "two-sided baseis"),其中通过U或V的non-zero element所在的indictor确定user和item所属聚类。正交分解解法ref:http://nuoku.vip/users/2-Betageek/articles/197-2018-06-30
(2)确定(1)中学到的B后,在target domain再执行矩阵分解(X=UBV,其中U和V的每行有且只有一个非0元素1,代表其属于一个对应的聚类);求解方法:a binary integer programming problem,文中给出
通过以上两个步骤,相当于把target domain中的users和items在聚类空间中对齐到source domain当中(两个领域的聚类空间一致,也就是the vector of representative users/items 是一样的)
文献题目 | 去谷歌学术搜索 | ||||||||||
Can Movies and Books Collaborate? Cross-Domain Collaborative Filtering for Sparsity Reduction | |||||||||||
文献作者 | Bin Li;Qiang Yang; Xiangyang Xue | ||||||||||
文献发表年限 | 2009 | ||||||||||
文献关键字 | |||||||||||
cross-domain; transfer;codebook;CBT | |||||||||||
摘要描述 | |||||||||||
The sparsity problem in collaborative filtering (CF) is a major bottleneck for most CF methods. In this paper, we consider a novel approach for alleviat- ing the sparsity problem in CF by transferring user- item rating patterns from a dense auxiliary rating matrix in other domains (e.g., a popular movie rat- ing website) to a sparse rating matrix in a target domain (e.g., a new book rating website). We do not require that the users and items in the two do- mains be identical or even overlap. Based on the limited ratings in the target matrix, we establish a bridge between the two rating matrices at a cluster- level of user-item rating patterns in order to transfer more useful knowledge from the auxiliary task do- main. We first compress the ratings in the auxiliary rating matrix into an informative and yet compact cluster-level rating pattern representation referred to as a codebook. Then, we propose an efficient al- gorithm for reconstructing the target rating matrix by expanding the codebook. We perform extensive empirical tests to show that our method is effective in addressing the data sparsity problem by transfer- ring the useful knowledge from the auxiliary tasks, as compared to many state-of-the-art CF methods. |