本文结合了隐因子分解和LDA两种方法。
其中LDA是基于用户对items的reviews来进行学习的,即将所有用户关于某个item的review集合成一个关于该item的document。
具体做法:
(1) 隐因子分解方法有关于items的隐因子向量,而LDA方法有关于item/document的主题分布(向量),通过联系这两个向量从而达到同意的学习目标就是本文的核心。
(2)联系的方法很简单: 将前者通过一定方法归一后,赋值给后者(向量各元素的指数求和,Eq.4)。归一的原因:符合概率分布的特点。文中也利用同样的操作处理了主题-word分布。
(3)最终的优化目标包含了两个部分线性相加: 评分的最小平方差,关于主题模型的极大似然。
(4)迭代优化分别,先求分解因子向量,通过因子向量更新item-topic向量,再利用item-topic向量和topic-Word向量更新词语的主题分配,完了再更新topic-word向量。(这里于LDA的区别在于并不会利用重新分配的主题更新item-topic向量。)
(5)综上,融入review信息的新object,其实就是增加了约束条件。
另外本文有意思的地方:
(1)用真是分类去匹配隐分类 - Genre Discovery
(2)辨别哪些用户的review对item的描述更准确(整体所以用户review而言),算距离 - Identifying Userful Reviews
(3)另外本文所说的解决cold start problem,并不是指new user or new item, 而是,one dose not have enough rating data available. - Even a single review can tell us many of a product's properties, such as ites genre.
(4)归一化处理: 在指数函数里加上一个参数k - Intuitively, large k menas that users only discuss the most important topics, while small k means that users discuss all topics evnly.
(5)另外,本文引入LDA学习方法也值得借鉴- LibRec。
文献题目 | 去谷歌学术搜索 | ||||||||||
Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text | |||||||||||
文献作者 | Julian McAuley; Jure Leskovec | ||||||||||
文献发表年限 | 2013 | ||||||||||
文献关键字 | |||||||||||
recommender systems, topic models, librec | |||||||||||
摘要描述 | |||||||||||
In order to recommend products to users, we must ultimately predict how a user will respond to a new product. To do so we must uncover the implicit tastes of each user as well as the properties of each product. For example, in order to predict whether a user will enjoy Harry Potter, it helps to identify that the book is about wizards, as well as the user’s level of interest in wizardry. User feedback is required to discover these latent product and user dimensions. Such feedback often comes in the form of a numeric rating accompanied by review text. However, traditional methods often discard review text, which makes user and product latent dimensions difficult to interpret, since they ignore the very text that justifies a user’s rating. In this paper, we aim to combine latent rating dimensions (such as those of latent-factor recommender systems) with latent review topics (such as those learned by topic models like LDA). Our approach has several advantages. Firstly, we ob- tain highly interpretable textual labels for latent rating dimensions, which helps us to ‘justify’ ratings with text. Secondly, our approach more accurately predicts product ratings by harnessing the informa- tion present in review text; this is especially true for new products and users, who may have too few ratings to model their latent fac- tors, yet may still provide substantial information from the text of even a single review. Thirdly, our discovered topics can be used to facilitate other tasks such as automated genre discovery, and to identify useful and representative reviews. |