Book
Proceedings of Analysis of Images, Social Networks and Texts – 7th International Conference, AIST 2018, Moscow, Russia, July 5-7, 2018, Revised Selected Papers. Lecture Notes in Computer Science
This book constitutes the proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts, AIST 2018, held in Moscow, Russia, in July 2018.
The 29 full papers were carefully reviewed and selected from 107 submissions (of which 26 papers were rejected without being reviewed). The papers are organized in topical sections on natural language processing; analysis of images and video; general topics of data analysis; analysis of dynamic behavior through event data; optimization problems on graphs and network structures; and innovative systems.
In this paper we address the group-level emotion classification problem in video analytic systems.We propose to apply the MTCNN face detector to obtain facial regions on each video frame. Next, off-the-shelf image features are extracted from each located face using preliminary trained convolutional neural networks. The features of the whole frame are computed as a mean average of image embeddings of individual faces. The resulted frame features are recognized with an ensemble of state-of-the-art classifiers computed as a weighted sum of their outputs. Experimental results with EmotiW 2017 dataset demonstrate that the proposed approach is 2–20% more accurate when compared to the conventional group-level emotion classifiers.
Earth remote sensing imagery come from satellites, unmanned aerial vehicles, airplanes, and other sources. National agen- cies, commercial companies, and individuals across the globe collect enor- mous amounts of such imagery daily. Array DBMS are one of the promi- nent tools to manage and process large volumes of geospatial imagery. The core data model of an array DBMS is an N-dimensional array. Recently we presented a geospatial array DBMS – ChronosDB – which outperforms SciDB by up to 75× on average. We are about to launch a Cloud service running our DBMS. SciDB is the only freely available dis- tributed array DBMS to date. Remote sensing imagery are traditionally stored in files of sophisticated formats, not in databases. Unlike SciDB, ChronosDB does not require importing files into an internal DBMS for- mat and works with imagery “in situ”: directly in their native file for- mats. This is one of the many virtues of ChronosDB. It has now certain aggregation capabilities, but this paper focuses on more advanced aggre- gation queries which still constitute a large portion of a typical work- load applied to remote sensing imagery. We integrate the aggregation types into the data model, present the respective algorithms to perform aggregations in a distributed fashion, and thoroughly compare the per- formance of our technique with SciDB. We carried out experiments on real-world data on 8- and 16-node clusters in Microsoft Azure Cloud.
The problem of effective management of company subsidiaries has been on the forefront of strategic management research since early mid-1980s. Recently, special attention is being paid to the effect of headquarters - subsidiary conflicts on the company performance, especially in relation to the subsidiaries’ resistance, both active and passive, to following the directives of the headquarters. A large number of theoretical approaches have been used to explain the existence of intraorganizational conflicts. For example, Strutzenberger and Ambos (2013) examined a variety of ways to conceptualize a subsidiary, from an individual up to a network level. The network conceptualization, at present, is the only approach that could allow explaining the dissimilarity of the subsidiaries’ responses to headquarters’ directives, given the same or very similar distribution of financial and other resources, administrative support from the head office to subsidiaries, and levels of subsidiary integration. This is because social relationships between different actors inside the organization, the strength of ties and the size of networks, as well as other characteristics, could be the explanatory variables that researchers have been looking for in their quest to resolve varying degrees of responsiveness of subsidiaries, and – in fact – headquarters’ approaches – to working with subsidiaries. The purpose of this study is to evaluate the variety of characteristics of networks formed between actors in headquarters and subsidiaries, and their effects on a variety of performance indicators of subsidiaries, as well as subsidiary-headquarters conflicts. Data is being collected in two waves at a major Russian company with over 200,000 employees and several subsidiaries throughout the country.
Co-authorship networks contain invisible patterns of collaboration among researchers. The process of writing joint paper can depend of different factors, such as friendship, common interests, and policy of university. We show that, having a temporal co-authorship network, it is possible to predict future publications. We solve the problem of recommending collaborators from the point of link prediction using graph embedding, obtained from co-authorship network. We run experiments on data from HSE publications graph and compare it with relevant models.
In this paper, we consider new formulation of graph embedding algorithm, while learning node and edge representation under common constraints. We evaluate our approach on link prediction problem for co-authorship network of HSE researchers’ publications. We compare it with existing structural network embeddings and feature-engineering models.
In this paper (The first author is the 1st place winner of the Open HSE Student Research Paper Competition (NIRS) in 2017, Computer Science nomination, with the topic “Extraction of Visual Features for Recommendation of Products”, as alumni of 2017 “Data Science” master program at Computer Science Faculty, HSE, Moscow), we describe a special recommender approach based on features extracted from the clothes’ images. The method of feature extraction relies on pre-trained deep neural network that follows transfer learning on the dataset. Recommendations are generated by the neural network as well. All the experiments are based on the items of category Clothing, Shoes and Jewelry from Amazon product dataset. It is demonstrated that the proposed approach outperforms the baseline collaborative filtering method.
In this paper, we develop a predictive model for the multi-phase wellbore flows based on ensembles of decision trees like Random Forest or XGBoost. The tree-based ensembles are trained on the time series of different physical parameters generated using the numerical simulator of the full-scale transient wellbore flows. Once the training is completed, the ensemble is used to predict one of the key parameters of the wellbore flow, namely, the bottomhole pressure. According to our recent experiments with complex wellbore configurations and flows, the normalized root mean squared error (NRMSE) of prediction below 5% can be achieved and beaten by ensembles of decision trees in comparison to artificial neural networks. Moreover, the obtained solution is more scalable and demonstrate good noise-tolerance properties. The error analysis shows that the prediction becomes particularly challenging in the case of highly transient slug flows. Some hints for overcoming these challenges and research prospects are provided.
The accurate geo-localization of mobile devices based upon received signal strength (RSS) in an urban area is hindered by obstacles in the signal propagation path. Current localization methods have their own advantages and drawbacks. Triangular lateration (TL) is fast and scalable but employs a monotone RSS-to-distance transformation that unfortunately assumes mobile devices are on the line of sight. Radio frequency fingerprinting (RFP) methods employ a reference database, which ensures accurate localization but unfortunately hinders scalability.
Here, we propose a new, simple, and robust method called lookup lateration (LL), which incorporates the advantages of TL and RFP without their drawbacks. Like RFP, LL employs a dataset of reference locations but stores them in separate lookup tables with respect to RSS and antenna towers. A query observation is localized by identifying common locations in only associating lookup tables. Due to this decentralization, LL is two orders of magnitude faster than RFP, making it particularly scalable for large cities. Moreover, we show that analytically and experimentally, LL achieves higher localization accuracy than RFP as well. For instance, using grid size 20 m, LL achieves 9.11 m and 55.66 m, while RFP achieves 72.50 m and 242.19 m localization errors at 67\% and 95\%, respectively, on the Urban Hannover Scenario dataset.
We present in the form of two visualizations some preliminary results of the ongoing study of data science community in Russia. The rst visualization aggregates data about top researches and their elds of interest according to the Google Scholar service. The second graph is a map of the largest online communities on date science on VKontakte platform.
This research is motivated by sustainability problems of oil palm expansion. Fast-growing industrial Oil Palm Plantations (OPPs) in the tropical belt of Africa, Southeast Asia and parts of Brazil lead to significant loss of rainforest and contribute to the global warming by the corresponding decrease of carbon dioxide absorption. We propose a novel approach to monitoring of the expansion of OPPs based on an application of state-of-the-art Fully Convolutional Neural Networks (FCNs) to solve Semantic Segmentation Problem for Landsat imagery. The proposed approach significantly outperforms per-pixel classification methods based on Random Forest using texture features, NDVI, and all Landsat bands. Moreover, the trained FCN is robust to spatial and temporal shifts of input data. The paper provides a proof of concept that FCNs as semi-automated methods enable OPPs mapping of entire countries and may serve for yearly detection of oil palm expansion.

The article discusses one argument in favor of descriptive theory of reference of proper names against the theory of direct reference which appeals to a famous example of the ship of Theseus. The author defends the latter theory by means of distinguishing the object of direct reference and its principles of individuation. The argument is discussed with reference to the works of H. Chandler, L. Linsky, S. Kripke, N. Salmon and other theorists.
This is an interdisciplinary volume that focuses on the central topic of the representation of events, namely cross-cultural differences in representing time and space, as well as various aspects of the conceptualisation of space and time. It brings together research on space and time from a variety of angles, both theoretical and methodological. Crossing boundaries between and among disciplines such as linguistics, psychology, philosophy, or anthropology forms a creative platform in a bold attempt to reveal the complex interaction of language, culture, and cognition in the context of human communication and interaction.
The authors address the nature of spatial and temporal constructs from a number of perspectives, such as cultural specificity in determining time intervals in an Amazonian culture, distinct temporalities in a specific Mongolian hunter community, Russian-specific conceptualisation of temporal relations, Seri and Yucatec frames of spatial reference, memory of events in space and time, and metaphorical meaning stemming from perception and spatial artefacts, to name but a few themes.
Proceeding of the 15th International Conference on Artificial Intelligence: Methodology, Systems, Applications , AIMSA 2012, Varna, Bulgaria, September 12-15, 2012.
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
In the paper we present a new notion of stochastic monotone measure and its application to image processing. By definition, a stochastic monotone measure is a random value with values in the set of monotone measures and it can describe a choice of random features in image processing. In this case, a monotone measure describes uncertainty in the problem of choosing the set of features with the highest value of informativeness and its stochastic behavior is explained by a noise that can corrupt images.
There have been implemented engineering and development of multi-agent recommender system «EZSurf» that performs analysis of interests and provides recommendations for the social network «VKontakte» users based on the data from profile of particular user. During the work process different methods and technological solutions have been analyzed with examination of their advantages and disadvantages. Besides of that the comparative analysis of analogous products has been held where the most similar is Russian start-up service - Surfingbird. Based on this analysis the decision of recommender system implementation and integration has been accepted. The feature of this system is that it uses social network “VKontakte” profile for user’s data collection and API of third-party services (LastFM, TheMovieDB) for an extraction of information about similar objects. Such an approach contributes into optimization of recommender system, because it does not require creation of its own object classification system and objects database. The functionality of multi-agent system was separated between three agents. First agent (Collector) collects user data from “VKontakte” profile using VK API. Second agent (Analyzer) collects similar objects from databases of thitd-party services (LastFM, TheMovieDB) that will be the criteria for further search of recommendatory content. For search and selection of information an agent (Recommender) that works as web-crawler has been implemented. System «EZSurf» can be exploited by the users of social network “VKontakte” in everyday life for time economy on web-surfing process. At the same time they will get recommendations on content that are filtered depending on preferences of every particular user.
We consider certain spaces of functions on the circle, which naturally appear in harmonic analysis, and superposition operators on these spaces. We study the following question: which functions have the property that each their superposition with a homeomorphism of the circle belongs to a given space? We also study the multidimensional case.
We consider the spaces of functions on the m-dimensional torus, whose Fourier transform is p -summable. We obtain estimates for the norms of the exponential functions deformed by a C1 -smooth phase. The results generalize to the multidimensional case the one-dimensional results obtained by the author earlier in “Quantitative estimates in the Beurling—Helson theorem”, Sbornik: Mathematics, 201:12 (2010), 1811 – 1836.
We consider the spaces of function on the circle whose Fourier transform is p-summable. We obtain estimates for the norms of exponential functions deformed by a C1 -smooth phase.
This proceedings publication is a compilation of selected contributions from the “Third International Conference on the Dynamics of Information Systems” which took place at the University of Florida, Gainesville, February 16–18, 2011. The purpose of this conference was to bring together scientists and engineers from industry, government, and academia in order to exchange new discoveries and results in a broad range of topics relevant to the theory and practice of dynamics of information systems. Dynamics of Information Systems: Mathematical Foundation presents state-of-the art research and is intended for graduate students and researchers interested in some of the most recent discoveries in information theory and dynamical systems. Scientists in other disciplines may also benefit from the applications of new developments to their own area of study.