文献作者 | Donghyun Kim 1 , Chanyoung Park 1 , Jinoh Oh 1 , Sungyoung Lee 2 , Hwanjo Yu ∗1 | ||||||||||
文献发表年限 | 2016 | 创建时间 | 2017-06-21 | ||||||||
文献关键字 | Collaborative Filtering; Document Modeling; Contexual Information; Deep learning; CNN | ||||||||||
摘要描述 | Sparseness of user-to-item rating data is one of the major factors that deteriorate the quality of recommender system. To handle the sparsity problem, several recommendation techniques have been proposed that additionally consider auxiliary information to improve rating prediction accuracy. In particular, when rating data is sparse, document modeling-based approaches have improved the accuracy by additionally utilizing textual data such as reviews, abstracts, or synopses. However, due to the inherent limitation of the bag-of-words model, they have difficulties in effectively utilizing contextual information of the documents, which leads to shallow understanding of the documents. This paper proposes a novel context-aware recommendation model, convolutional matrix fac- torization (ConvMF) that integrates convolutional neural network (CNN) into probabilistic matrix factorization (PMF). Consequently, ConvMF captures contextual information of documents and further enhances the rating prediction accuracy. Our extensive evaluations on three real-world datasets show that ConvMF significantly out- performs the state-of-the-art recommendation models even when the rating data is extremely sparse. We also demonstrate that ConvMF successfully captures subtle contextual difference of a word in a document. Our implementation and datasets are available at http://dm.postech.ac.kr/ConvMF. |
文献作者 | Ruslan Salakhutdinov and Andriy Mnih | ||||||||||
文献发表年限 | 2008 | 创建时间 | 2017-06-20 | ||||||||
文献关键字 | PMF; probabilisitc graphical model; 矩阵系数问题; sparse; sparsity; 1659 citation | ||||||||||
摘要描述 | Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a con- strained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system. |
文献作者 | Xin Dong, Lei Yu, Zhonghuo Wu, Yuxia Sun, Lingfeng Yuan, Fangxi Zhang | ||||||||||
文献发表年限 | 2017 | 创建时间 | 2017-06-19 | ||||||||
文献关键字 | Deep learning ; DAE; AutoEncoder; MF: CF | ||||||||||
摘要描述 | Collaborative filtering(CF) is a widely used approach in recommender systems to solve many real-world problems. Traditional CF-based methods employ the user-item matrix which encodes the individual preferences of users for items for learning to make recommendation. In real applications, the rating matrix is usually very sparse, causing CF-based methods to degrade significantly in recommendation performance. In this case, some improved CF methods utilize the in- creasing amount of side information to address the data sparsity problem as well as the cold start problem. However, the learned latent factors may not be effective due to the sparse nature of the user-item matrix and the side information. To address this problem, we utilize advances of learning effective representations in deep learning, and propose a hybrid model which jointly performs deep users and items’ latent factors learning from side information and collaborative filtering from the rating matrix. Extensive experimental results on three real-world datasets show that our hybrid model out- performs other methods in effectively utilizing side information and achieves performance improvement. |
文献作者 | Ruslan Salakhutdinov, Andriy Mnih, Geoffrey Hinton | ||||||||||
文献发表年限 | 2007 | 创建时间 | 2017-06-17 | ||||||||
文献关键字 | |||||||||||
摘要描述 | Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical mod- els, called Restricted Boltzmann Machines (RBM’s), can be used to model tabular data, such as user’s ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM’s can be successfully applied to the Netflix data set, containing over 100 mil- lion user/movie ratings. We also show that RBM’s slightly outperform carefully-tuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system. |
文献作者 | Suvash Sedhain †∗ , Aditya Krishna Menon †∗ , Scott Sanner †∗ , Lexing Xie ∗ | ||||||||||
文献发表年限 | 2015 | 创建时间 | 2017-06-17 | ||||||||
文献关键字 | Recommender Systems; Collaborative Filtering; Autoencoders; 编码,解码; 生成模型; 判别模型 | ||||||||||
摘要描述 | This paper proposes AutoRec, a novel autoencoder frame- work for collaborative filtering (CF). Empirically, AutoRec’s compact and efficiently trainable model outperforms state- of-the-art CF techniques (biased matrix factorization, RBM- CF and LLORMA) on the Movielens and Netflix datasets. |
文献作者 | Santiago Larraín, Denis Parra, Alvaro Soto | ||||||||||
文献发表年限 | 2015 | 创建时间 | 2017-06-15 | ||||||||
文献关键字 | SLIM; GSLIM; latent factor vectors; encoding; prototype matrix; Orthogonal Matching Pursuit (OMP) algorithm; 原子; 信号分解 | ||||||||||
摘要描述 | Sparse Linear Methods (SLIM) are state-of-the-art recommendation approaches based on matrix factorization, which rely on a regularized l 1 -norm and l 2 -norm optimization –an alternative optimization problem to the traditional Frobenious norm. Although they have shown outstanding performance in Top-N recommendation, existent works have not yet analyzed some inherent assumptions that can have an important effect on the performance of these algorithms. In this paper, we attempt to improve the performance of SLIM by proposing a generalized formulation of the aforementioned assumptions. Instead of directly learning a sparse representation of the user-item matrix, we (i) learn the latent factors’ matrix of the users and the items via a traditional matrix factorization approach, and then (ii) reconstruct the latent user or item matrix via prototypes which are learned using sparse coding, an alternative SLIM commonly used in the image processing domain. The results show that by tuning the parameters of our generalized model we are able to outperform SLIM in several Top-N recommendation experiments conducted on two different datasets, using both nDCG and nDCG@10 as evaluation metrics. These preliminary results, although not conclusive, indicate a promising line of research to improve the performance of SLIM recommendation. |
文献作者 | Zhao Kang Qiang Cheng | ||||||||||
文献发表年限 | 2016 | 创建时间 | 2017-06-15 | ||||||||
文献关键字 | 非凸秩估计; slim | ||||||||||
摘要描述 | The importance of accurate recommender systems has been widely recognized by academia and industry. How- ever, the recommendation quality is still rather low. Recently, a linear sparse and low-rank representation of the user-item matrix has been applied to produce Top-N recommendations. This approach uses the nu- clear norm as a convex relaxation for the rank function and has achieved better recommendation accuracy than the state-of-the-art methods. In the past several years, solving rank minimization problems by leveraging nonconvex relaxations has received increasing attention. Some empirical results demonstrate that it can provide a better approximation to original problems than con- vex relaxation. In this paper, we propose a novel rank approximation to enhance the performance of Top-N recommendation systems, where the approximation error is controllable. Experimental results on real data show that the proposed rank approximation improves the Top-N recommendation accuracy substantially. |
文献作者 | Zhao Kang Chong Peng Qiang Cheng | ||||||||||
文献发表年限 | 2016 | 创建时间 | 2017-06-15 | ||||||||
文献关键字 | SLIM; nuclear; LorSLIM; augmented Lagrangian multiplier (ALM) method; logdet; 秩约束; nonconvex relaxation; convex relaxation; 非凸松弛; | ||||||||||
摘要描述 | Top-N recommender systems have been investigated widely both in industry and academia. However, the recommenda- tion quality is far from satisfactory. In this paper, we propose a simple yet promising algorithm. We fill the user-item ma- trix based on a low-rank assumption and simultaneously keep the original information. To do that, a nonconvex rank relax- ation rather than the nuclear norm is adopted to provide a better rank approximation and an efficient optimization strat- egy is designed. A comprehensive set of experiments on real datasets demonstrates that our method pushes the accuracy of Top-N recommendation to a new level. |
文献作者 | |||||||||||
文献发表年限 | 2016 | 创建时间 | 2017-06-08 | ||||||||
文献关键字 | 推荐系统开源软件列表汇总和点评 | ||||||||||
摘要描述 | 一个很不错的推荐系统库 http://blog.csdn.net/cserchen/article/details/14231153 |
文献作者 | Rodrigo M. Silva, Guilherme C. M. Gomes, Mário S. Alvim, Marcos A. Gonçalves | ||||||||||
文献发表年限 | 2016 | 创建时间 | 2017-05-12 | ||||||||
文献关键字 | L2R; IR; AL; Active Learning (主动学习); 半监督学习; 直推学习(transductive learning) | ||||||||||
摘要描述 | Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning (AL) algorithms are able to reduce the labeling effort by actively sampling an unlabeled set and choosing data instances that maximize the effectiveness of a learning function. But AL methods require constant su- pervision, as documents have to be labeled at each round of the process. In this paper, we propose that certain char- acteristics of unlabeled L2R datasets allow for an unsuper- vised, compression-based selection process to be used to cre- ate small and yet highly informative and effective initial sets that can later be labeled and used to bootstrap a L2R sys- tem. We implement our ideas through a novel unsupervised selective sampling method, which we call Cover, that has several advantages over AL methods tailored to L2R. First, it does not need an initial labeled seed set and can select documents from scratch. Second, selected documents do not need to be labeled as the iterations of the method progress since it is unsupervised (i.e., no learning model needs to be updated). Thus, an arbitrarily sized training set can be selected without human intervention depending on the avail- able budget. Third, the method is efficient and can be run on unlabeled collections containing millions of query-document instances. We run various experiments with two important L2R benchmarking collections to show that the proposed method allows for the creation of small, yet very effective training sets. It achieves full training-like performance with less than 10% of the original sets selected, outperforming the baselines in both effectiveness and scalability. |