诺库维普

博文
相关博文

本文结合了隐因子分解和LDA两种方法。

其中ＬＤＡ是基于用户对ｉｔｅｍｓ的ｒｅｖｉｅｗｓ来进行学习的，即将所有用户关于某个ｉｔｅｍ的ｒｅｖｉｅｗ集合成一个关于该ｉｔｅｍ的ｄｏｃｕｍｅｎｔ。

具体做法：

（１）隐因子分解方法有关于ｉｔｅｍｓ的隐因子向量，而ＬＤＡ方法有关于ｉｔｅｍ/document的主题分布（向量），通过联系这两个向量从而达到同意的学习目标就是本文的核心。

（２）联系的方法很简单：　将前者通过一定方法归一后，赋值给后者（向量各元素的指数求和，Ｅｑ.4）。归一的原因：符合概率分布的特点。文中也利用同样的操作处理了主题－ｗｏｒｄ分布。

（３）最终的优化目标包含了两个部分线性相加：　评分的最小平方差，关于主题模型的极大似然。

（４）迭代优化分别，先求分解因子向量，通过因子向量更新ｉｔｅｍ－ｔｏｐｉｃ向量，再利用ｉｔｅｍ－ｔｏｐｉｃ向量和ｔｏｐｉｃ－Ｗｏｒｄ向量更新词语的主题分配，完了再更新ｔｏｐｉｃ－ｗｏｒｄ向量。（这里于ＬＤＡ的区别在于并不会利用重新分配的主题更新ｉｔｅｍ－ｔｏｐｉｃ向量。）

（５）综上，融入ｒｅｖｉｅｗ信息的新ｏｂｊｅｃｔ，其实就是增加了约束条件。

另外本文有意思的地方：

（１）用真是分类去匹配隐分类　－　ＧｅｎｒｅＤｉｓｃｏｖｅｒｙ

（２）辨别哪些用户的ｒｅｖｉｅｗ对ｉｔｅｍ的描述更准确（整体所以用户ｒｅｖｉｅｗ而言），算距离　－　Identifying Userful Reviews

（３）另外本文所说的解决ｃｏｌｄｓｔａｒｔ problem,并不是指new user or new item, 而是，one dose not have enough rating data available. - Even a single review can tell us many of a product's properties, such as ites genre.

（４）归一化处理：　在指数函数里加上一个参数ｋ　－　Intuitively, large k menas that users only discuss the most important topics, while small k means that users discuss all topics evnly.

文献题目			去谷歌学术搜索
Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text
文献作者			Julian McAuley; Jure Leskovec
文献发表年限			2013
文献关键字
recommender systems, topic models, librec
摘要描述
In order to recommend products to users, we must ultimately predict how a user will respond to a new product. To do so we must uncover the implicit tastes of each user as well as the properties of each product. For example, in order to predict whether a user will enjoy Harry Potter, it helps to identify that the book is about wizards, as well as the user’s level of interest in wizardry. User feedback is required to discover these latent product and user dimensions. Such feedback often comes in the form of a numeric rating accompanied by review text. However, traditional methods often discard review text, which makes user and product latent dimensions difficult to interpret, since they ignore the very text that justifies a user’s rating. In this paper, we aim to combine latent rating dimensions (such as those of latent-factor recommender systems) with latent review topics (such as those learned by topic models like LDA). Our approach has several advantages. Firstly, we ob- tain highly interpretable textual labels for latent rating dimensions, which helps us to ‘justify’ ratings with text. Secondly, our approach more accurately predicts product ratings by harnessing the informa- tion present in review text; this is especially true for new products and users, who may have too few ratings to model their latent fac- tors, yet may still provide substantial information from the text of even a single review. Thirdly, our discovered topics can be used to facilitate other tasks such as automated genre discovery, and to identify useful and representative reviews.

留言