Event Recognition with Automatic Album Detection based on Sequential Grouping of Confidence Scores and Neural Attention
In this paper a new formulation of event recognition task is examined: it is required to predict event categories given a gallery of images, for which albums (groups of photos corresponding to a single event) are unknown. The novel two-stage approach is proposed. At first, features are extracted in each photo using the pre-trained convolutional neural network (CNN). These features are classified individually. The normalized scores of the classifier are used to group sequential photos into several clusters. Finally, the features of photos in each group are aggregated into a single descriptor using neural attention mechanism. This algorithm is implemented in Android mobile application. Experimental study with features extracted by contemporary convolutional neural networks including EfficientNets for Photo Event Collection and Multi-Label Curation of Flickr Events Dataset demonstrates that the proposed approach is 9-23% more accurate than conventional event recognition on single photos. Moreover, proposed method has 13-16% lower error rate when compared to classification of groups of photos obtained with hierarchical clustering of CNN-based embeddings.