Scene Recognition in User Preference Prediction Based on Classification of Deep Embeddings and Object Detection
In this paper we consider general scene recognition problem for analysis of user preferences based on his or her photos on mobile phone. Special attention is paid to out-of-class detections and efficient processing using MobileNet-based architectures. We propose the three stage procedure. At first, pre-trained convolutional neural network (CNN) is used extraction of input image embeddings at one of the last layers, which are used for training a classifier, e.g., support vector machine or random forest. Secondly, we fine-tune the pre-trained network on the given training set and compute the predictions (scores) at the output of the resulted CNN. Finally, we perform object detection in the input image, and the resulted sparse vector of detected objects is classified. The decision is made based on a computation of a weighted sum of the class posterior probabilities estimated by all three classifiers. Experimental results with a subset of ImageNet dataset demonstrate that the proposed approach is up to 5% more accurate when compared to conventional fine-tuned models.