Multi-label Image Set Recognition in Visually-Aware Recommender Systems
In this paper we focus on the problem of multi-label image recognition for visually-aware recommender systems. We propose a two stage approach in which a deep convolutional neural network is firstly fine-tuned on a part of the training set. Secondly, an attention-based aggregation network is trained to compute the weighted average of visual features in an input image set. Our approach is implemented as a mobile fashion recommender system application. It is experimentally show on the Amazon Fashion dataset that our approach achieves an F1-measure of 0.58 for 15 recommendations, which is twice as good as the 0.25 F1-measure for conventional averaging of feature vectors.