9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected Papers
The paper studies the community detection problem on Telegram channels. The dataset is received from TGStat service and includes the information of 58k forwards between 100 politician Telegram channels. We implement modern clustering approaches to solve the problem of missing social links. Our study is based on a combination of structural features with strategy-based attributes, including indicators designed according to the nodes’ role in a network. Authors provide ten novel indicators, which are calculated for each network’s member per each message in order to vectorize a Telegram channel with regard to its strategy of information spread and the way of contacting other channels. Authors construct a metric-based graph of channel relations and cluster channels representations using network science techniques. Obtained results are studied using quantitative and qualitative analysis showing promising results in applying joint network-based and KPI-based models for the stated problem.
Manga colorization is time-consuming and hard to automate. In this paper, we propose a conditional adversarial deep learning approach for semi-automatic manga images colorization. The system directly maps a tuple of grayscale manga page image and sparse color hint constructed by the user to an output colorization. High-quality colorization can be obtained in a fully automated way, and color hints allow users to revise the colorization of every panel independently. We collect a dataset of manually colorized and grayscale manga images for training and evaluation. To perform supervised learning, we construct synthesized monochrome images from colorized. Furthermore, we suggest a few steps to reduce the domain gap between synthetic and real data. Their influence is evaluated both quantitatively and qualitatively. Our method can achieve even better results by fine-tuning with a small number of grayscale manga images of a new style. The code is available at github.com.
We study non-reference image and video quality assessment methods, which are of great importance for computational video editing. The object of our work is image quality assessment (IQA) applicable for fast and robust frame-by-frame multipurpose video quality assessment (VQA) for short videos.
We present a complex framework for assessing the quality of images and videos. The scoring process consists of several parallel steps of metric collection with final score aggregation step. Most of the individual scoring models are based on deep convolutional neural networks (CNN). The framework can be flexibly extended or reduced by adding or removing these steps. Using Deep CNN-Based Blind Image Quality Predictor (DIQA) as a baseline for IQA, we proposed improvements based on two patching strategies, such as uniform patching and object-based patching, and add intelligent pre-training step with distortion classification.
We evaluated our model on three IQA benchmark image datasets (LIVE, TID2008, and TID2013) and manually collected short YouTube videos. We also consider interesting for automated video editing metrics used for video scoring based on the scale of a scene, face presence in frame and compliance of the shot transitions with the shooting rules. The results of this work are applicable to the development of intelligent video and image processing systems.
Computer vision technologies are widely used in sports to control the quality of training. However, there are only a few approaches to recognizing the punches of a person engaged in boxing training. All existing approaches have used manual feature selection and trained on insufficient datasets. We introduce a new approach for recognizing actions in an untrimmed video based on three stages: removing frames without actions, action localization and action classification. Furthermore, we collected a sufficient dataset that contains five classes in total represented by more than 1000 punches in total. On each stage, we compared existing approaches and found the optimal model that allowed us to recognize actions in untrimmed videos with an accuracy 87%.
We present a novel dataset of sports broadcasts with 8,781 games. The dataset contains 700 thousand comments and 93 thousand related news documents in Russian. We run an extensive series of experiments of modern extractive and abstractive approaches. The results demonstrate that BERT-based models show modest performance, reaching up to 0.26 ROUGE-1F-measure. In addition, human evaluation shows that neural approaches could generate feasible although inaccurate news basing on broadcast text.