本文认为review和product img之间是有联系的。所以可以利用attention机制标记出img中用户感兴趣的部分,有利于对于推荐结果的解释性。
具体做法比较简单:
(1)将图片进行手动划分区域,为每个单独的区域进行CNN卷积,得到独立的表达
(2)以上独立的表达,就可以利用attention机制合一
(3)合一后的表达有可以同review中word的表达结合在一起
(4)(3)中的表达又可以利用LSTM进行学习
最终得到attentions
文献题目 | 去谷歌学术搜索 | ||||||||||
Personalized Fashion Recommendation with Visual Explanations based on Multimodal Attention Network | |||||||||||
文献作者 | Xu Chen; Yongfeng Zhang | ||||||||||
文献发表年限 | 2019 | ||||||||||
文献关键字 | |||||||||||
Attention Network; 不同用户关注到的图片都parts可能会不一样;有些只关心部分,有些就会关心整体; 部分和整体 | |||||||||||
摘要描述 | |||||||||||
Fashion recommendation has attracted increasing attention from both industry and academic communities. This paper proposes a novel neural architecture for fashion recommendation based on both image region-level features and user review information. Our basic intuition is that: for a fashion image, not all the regions are equally important for the users, i.e., people usually care about a few parts of the fashion image. To model such human sense, we learn an attention model over many pre-segmented image regions, based on which we can understand where a user is really interested in on the image, and correspondingly, represent the image in a more accurate manner. In addition, by discovering such fine-grained visual preference, we can visually explain a recommendation by highlighting some regions of its image. For better learning the attention model, we also introduce user review information as a weak supervision signal to collect more comprehensive user preference. In our final framework, the visual and textual features are seamlessly coupled by a multimodal attention network. Based on this architecture, we can not only provide accurate recommendation, but also can accompany each recommended item with novel visual explanations. We conduct extensive experiments to demonstrate the superiority of our proposed model in terms of Top-N recommendation, and also we build a collectively labeled dataset for evaluating our provided visual explanations in a quantitative manner. |