Event Recognition Based on Classification of Generated Image Captions

A. Savchenko; Miasnikov E.

doi:10.1007/978-3-030-44584-3_33

Publications

?

Event Recognition Based on Classification of Generated Image Captions

Ch. 33. P. 418–430.

Savchenko A., Miasnikov E.

In this paper, we consider the problem of event recognition on single images. In contrast to conventional fine-tuning of convolutional neural networks (CNN), we proposed to use image captioning, i.e., a generative model that converts images to textual descriptions. The motivation here is the possibility to combine conventional CNNs with a completely different approach in an ensemble with high diversity. As event recognition task has nothing serial or temporal, obtained captions are one-hot encoded and summarized into a sparse feature vector suitable for the learning of an arbitrary classifier. We provide the experimental study of several feature extractors for Photo Event Collection, Web Image Dataset for Event Recognition and Multi-Label Curation of Flickr Events Dataset. It is shown that the image captions trained on the Conceptual Captions dataset can be classified more accurately than the features from an object detector, though they both are obviously not as rich as the CNN-based features. However, an ensemble of CNN and our approach provides state-of-the-art results for several event datasets.

Keywords: ensemble classifiers ансамблевые методы 28.23.15 Распознавание образов. Обработка изображений Deep Convolutional Neural Networks сверточные нейронные сети Pattern recognition and classification Image captioning Event recognition распознавание событий

Publication based on the results of:

Research of robustness of network analysis algorithms (2020)

In book

Advances in Intelligent Data Analysis XVIII (IDA 2020)

Vol. 12080. , Cham: Springer, 2020.

Event Recognition with Automatic Album Detection based on Sequential Grouping of Confidence Scores and Neural Attention

Savchenko A., , in: Proceedings of International Joint Conference on Neural Networks 2020 (IJCNN 2020). Piscataway: IEEE, 2020. P. 1–8.

In this paper a new formulation of event recognition task is examined: it is required to predict event categories given a gallery of images, for which albums (groups of photos corresponding to a single event) are unknown. The novel two-stage approach is proposed. At first, features are extracted in each photo using the pre-trained convolutional ...

Added: October 15, 2020

Извлечение предпочтений пользователя на основе методов автоматического порождения текстовых описаний изображений фотоальбома

Kharchevnikova A., Savchenko A., Компьютерная оптика 2020 Т. 44 № 4 С. 618–626

В работе рассматривается задача извлечения предпочтений пользователя по его фотоальбому. Предложен новый подход на основе автоматического порождения текстовых описаний фотографий и последующей классификации таких описаний. Проведен анализ известных методов создания аннотаций по изображению на основе свёрточных и рекуррентных (Long short-term memory) нейронных сетей. С использованием набора данных Google’s Conceptual Captions обучены новые модели, в которых ...

Added: September 16, 2020

Preference prediction based on a photo gallery analysis with scene recognition and object detection

Savchenko A., Demochkin K., Grechikhin I., Pattern Recognition 2022 Vol. 121 Article 108248

In this paper, a user modeling task is examined by processing mobile device gallery of photos and videos. We propose a novel engine for preferences prediction based on scene recognition, object detection and facial analysis. At first, all faces in a gallery are clustered, and all private photos and videos with faces from large clusters ...

Added: August 19, 2021

Кластеризация видеопоследовательностей в системах видеонаблюдения на основе сверточных нейронных сетей

Соколова А. Д., Savchenko A., В кн.: Материалы XXIII международной научно-технической конференции «Информационные системы и технологии-2017». [б.и.], 2017. С. 870–875.

Рассматривается задача структурирования информации в программных системах видеонаблюдения с помощью группирования видеоданных, в которых присутствуют идентичные лица. Сделан акцент на эффективную кластеризацию видеопоследовательностей с использованием сверточных нейронных сетей для извлечения характерных признаков. Разработан новый алгоритм кластеризации фрагментов видео на основе технологий глубокого обучения и статистического подхода. Приведены предварительные результаты экспериментального исследования точности и быстродействия предложенного ...

Added: October 24, 2017

Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Anastasiia D. Sokolova, Angelina S. Kharchevnikova, Savchenko A., , in: Analysis of Images, Social Networks and Texts. 6th International Conference, 2017, Revised Selected PapersVol. 10716. Cham: Springer, 2018. P. 223–230.

In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and ...

Added: May 2, 2018

Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Sokolova Anastasiia, Kharchevnikova Angelina, Savchenko A., Lecture Notes in Computer Science 2018 Vol. 10716 P. 223–230

Added: October 24, 2017

Multi-label Image Set Recognition in Visually-Aware Recommender Systems

Demochkin K., Savchenko A., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected PapersVol. 11832. Cham: Springer, 2019. Ch. 26 P. 291–297.

In this paper we focus on the problem of multi-label image recognition for visually-aware recommender systems. We propose a two stage approach in which a deep convolutional neural network is firstly fine-tuned on a part of the training set. Secondly, an attention-based aggregation network is trained to compute the weighted average of visual features in ...

Added: December 22, 2019

A New Sport Teams Logo Dataset for Detection Tasks

Kuznetsov A., Savchenko A., , in: Proceedings of the International Conference on Computer Vision and Graphics (ICCVG 2020)Vol. 12334. Cham: Springer, 2020. Ch. 8 P. 87–97.

In this research we introduce a new labelled SportLogo dataset, that contains images of two kinds of sports: hockey (NHL) and basketball (NBA). This dataset presents several challenges typical for logo detection tasks. A huge number of occlusions and logo view changes during playing games lead to an ambiguity of a straightforward detection approach use. ...

Added: October 1, 2020

Detection and Recognition of Food in Photo Galleries for Analysis of User Preferences

Miasnikov E., Savchenko A., , in: Proceedings of International Conference on Image Analysis and Recognition (ICIAR 2020)Vol. 12131. Cham: Springer, 2020. Ch. 9 P. 83–94.

Food analysis is one of the most important parts of user preference prediction engines for recommendation systems in the travel domain. In this paper, we describe and study the neural network method that allows you to recognize food in a gallery of photos taken with mobile devices. The described method consists of three main stages, ...

Added: October 1, 2020

Распознавание пола и возраста по видеоизображению лица на основе сверточных нейронных сетей

Kharchevnikova A., Savchenko A., В кн.: Материалы XXIII международной научно-технической конференции «Информационные системы и технологии-2017». [б.и.], 2017. С. 864–869.

Рассматривается задача построения интеллектуальных систем контекстной рекламы с автоматической настройкой на потенциальные предпочтения пользователя. Выполнен аналитический обзор современных публикаций, посвященных распознаванию пола и возраста по видеоизображению лица, в том числе на основе глубоких сверточных нейронных сетей. Проведен сравнительный анализ способов агрегации решений, полученных при распознавании каждого видеокадра. Приведены результаты экспериментального исследования их точности и быстродействия. ...

Added: October 24, 2017

Cluster Analysis of Facial Video Data in Video Surveillance Systems Using Deep Learning

Savchenko A., Sokolova Anastasiia D., , in: Computational Aspects and Applications in Large-Scale Networks. Springer Proceedings in Mathematics & StatisticsVol. 247. Springer, 2018. P. 113–120.

In this paper, we propose the approach of structuring information in video surveillance systems by grouping the videos, which contain identical faces. First, the faces are detected in each frame and features of each facial region are extracted at the output of preliminarily trained deep convolution neural networks. Second, the tracks that contain identical faces ...

Added: September 2, 2018

Traffic flow estimation with data from a video surveillance camera

Fedorov A., Nikolskaia K., Ivanov S. et al., Journal of Big Data 2019 Vol. 6 Article 73

This study addresses the problem of traffic flow estimation based on the data from a video surveillance camera. Target problem here is formulated as counting and classifying vehicles by their driving direction. This subject area is in early development, and the focus of this work is only one of the busiest crossroads in city Chelyabinsk, ...

Added: December 5, 2020

HSE-NN Team at the 4th ABAW Competition: Multi-task Emotion Recognition and Learning from Synthetic Images

Savchenko A., / Series Computer Science "arxiv.org". 2022.

In this paper, we present the results of the HSE-NN team in the 4th competition on Affective Behavior Analysis in-the-wild (ABAW). The novel multi-task EfficientNet model is trained for simultaneous recognition of facial expressions and prediction of valence and arousal on static photos. The resulting MT-EmotiEffNet extracts visual features that are fed into simple feed-forward ...

Added: October 21, 2022

Система постановки произношения на основе сверточных нейронных сетей и информационной теории восприятия речи

Savchenko L., Информационные технологии 2019 Т. 25 № 5 С. 313–318

We consider a problem of computer assisted language and pronunciation learning based on the deep learning methods and the information theory of speech perception. In order to improve the efficiency of testing of pronunciation quality, we propose to train a convolutional neural network using the best reference utterances from the user. The experimental results proved ...

Added: May 29, 2019

Дёмочкин К. В., Savchenko A., В кн.: Сборник трудов V Международной конференции и молодёжной школы "Информационные технологии и нанотехнологии" (ИТНТ 2019). [б.и.], 2019.

In this paper we focus on the problem of user prediction in visual product recommender systems based on the given set of photos of products purchased by the user previously. We studied neural aggregation methods for image features extracted by the deep neural networks. We propose the novel two-stage algorithm. At first, the image features ...

Added: December 4, 2018

Adaptive Video Image Recognition System Using a Committee Machine

Savchenko A., Optical Memory and Neural Networks (Information Optics) 2012 Vol. 21 No. 4 P. 219–226

An ensemble of classifiers has been built to solve the problem of video image recognition. The paper offers a way to estimate the a posteriori probability of an image belonging to a particular class in the case of an arbitrary distance and nearest neighbor method. The estimation is shown to be equivalent to the optimal ...

Added: January 18, 2013

Fast inference in convolutional neural networks based on sequential three-way decisions

Savchenko A., Information Sciences 2021 Vol. 560 P. 370–385

A novel image recognition algorithm based on sequential three-way decisions is introduced to speed up the inference in a convolutional neural network. In contrast to the majority of existing studies, our approach does not require a special procedure to train a neural network, and thus it can be used with arbitrary architectures including pre-trained convolutional ...

Added: February 25, 2021

Methods of obtaining geospatial data using satellite communications and their processing using convolutional neural networks

Tsvetkovskaya I. I., Tekutieva N. V., Prokofyeva E. N. et al., , in: 2020 Moscow Workshop on Electronic and Networking Technologies (MWENT). IEEE, 2020. P. 1–5.

The availability of high-resolution satellite images obtained through space radio communications offers the opportunity to use the most advanced technologies and techniques for analyzing remote sensing data. The paper discusses the data obtained with the use of ground-based, airborne or space-based filming equipment, which makes it possible to obtain images in one or several sections ...

Added: June 23, 2020

Об одном подходе к последовательному иерархическому распознаванию изображений

Savchenko A., Милов В. Р., В кн.: XVII ВСЕРОССИЙСКАЯ НАУЧНО-ТЕХНИЧЕСКАЯ КОНФЕРЕНЦИЯ "НЕЙРОИНФОРМАТИКА-2015": Сборник научных трудов. В 3-х частях.Ч. 3. М.: НИЯУ МИФИ, 2015. С. 50–58.

Рассматривается задача автоматического распознавания изображений. Предложен иерархический подход к ее решению, в котором переход на более детальный уровень описания происходит только при недостаточной надежности классификации на предыдущем уровне. Представлены примеры практического применения в задаче распознания лиц по фотографии. ...

Added: October 8, 2015

Computation-Efficient Face Recognition Algorithm Using a Sequential Analysis of High Dimensional Neural-Net Features

Sokolova A., Savchenko A., Optical Memory and Neural Networks (Information Optics) 2020 Vol. 29 No. 1 P. 19–29

The goal of the study is to increase the computation efficiency of the face recognition that uses feature vectors to describe facial images on photos and videos. These high-dimensional feature vectors are nowadays produced by convolutional neural networks. The methods to aggregate the features generated for each video frame are used to process the video ...

Added: October 25, 2019

Sequential three-way decisions in multi-category image recognition with deep features based on distance factor

Savchenko A., Information Sciences 2019 Vol. 489 P. 18–36

The paper addresses the issue of insufficient speed of image recognition methods if the number of classes is rather large. We propose the novel algorithm based on sequential three-way decisions and a formal description of granular computing. Each image is associated with principal component scores of the high-dimensional features extracted by deep convolution neural network. ...

Added: March 20, 2019

Emotion Recognition of a Group of People in Video Analytics Using Deep Off-the-Shelf Image Embeddings

Tarasov Alexander V., Savchenko A., , in: Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer ScienceVol. 11179. Berlin: Springer, 2018. Ch. 19 P. 191–198.

In this paper we address the group-level emotion classification problem in video analytic systems.We propose to apply the MTCNN face detector to obtain facial regions on each video frame. Next, off-the-shelf image features are extracted from each located face using preliminary trained convolutional neural networks. The features of the whole frame are computed as a ...

Added: December 12, 2018

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019