Упорядочивание данных в системах видеонаблюдения на основе технологий глубокого обучения
The task of organizing information in video surveillance systems is implemented by grouping the video tracks, which contain identical faces. We examine aggregation methods for the features of individual frames extracted using deep convolutional neural networks. The tracks with identical faces are grouped based on known face verification algorithms and clustering methods. Experimental study on the YouTubeFaces dataset demonstrates results of combining frame features in order to obtain a descriptor of video track. It is shown that the most accurate method is L2-normalization of average unnormalized features of individual frames of each video track.
This state-of-the-art survey is dedicated to the memory of Emmanuil Markovich Braverman (1931-1977), a pioneer in developing the machine learning theory. The 12 revised full papers and 4 short papers included in this volume were presented at the conference "Braverman Readings in Machine Learning: Key Ideas from Inception to Current State" held in Boston, MA, USA, in April 2017, commemorating the 40th anniversary of Emmanuil Braverman's decease. The papers present an overview of some of Braverman's ideas and approaches. The collection is divided in three parts. The first part bridges the past and the present. Its main contents relate to the concept of kernel function and its application to signal and image analysis as well as clustering. The second part presents a set of extensions of Braverman's work to issues of current interest both in theory and applications of machine learning. The third part includes short essays by a friend, a student, and a colleague.
This article represents a new technique for collaborative filtering based on pre-clustering of website usage data. The key idea involves using clustering methods to define groups of different users.
This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. Visualization, in this context, is a way of presenting results in a cognitively comfortable way. The term summarization is understood quite broadly here to embrace not only simple summaries like totals and means, but also more complex summaries such as the principal components of a set of features or cluster structures in a set of entities.
The material presented in this perspective makes a unique mix of subjects from the fields of statistical data analysis, data mining, and computational intelligence, which follow different systems of presentation.
The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Abstract. The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.
Technology mining (TM) helps to acquire intelligence about the evolution of research and development (R&D), technologies, products, and markets for various STI areas and what is likely to emerge in the future by identifying trends. The present chapter introduces a methodology for the identification of trends through a combination of “thematic clustering” based on the co-occurrence of terms, and “dynamic term clustering” based on the correlation of their dynamics across time. In this way, it is possible to identify and distinguish four patterns in the evolution of terms, which eventually lead to (i) weak signals of future trends, as well as (ii) emerging, (iii) maturing, and (iv) declining trends. Key trends identified are then further analyzed by looking at the semantic connections between terms identified through TM. This helps to understand the context and further features of the trend. The proposed approach is demonstrated in the field photonics as an emerging technology with a number of potential application areas.
Imaging mass spectrometry (imaging MS) has emerged in the past decade as a label-free, spatially resolved, and multipurpose bioanalytical technique for direct analysis of biological samples from animal tissue, plant tissue, biofilms, and polymer films. Imaging MS has been successfully incorporated into many biomedical pipelines where it is usually applied in the so-called untargeted mode-capturing spatial localization of a multitude of ions from a wide mass range. An imaging MS data set usually comprises thousands of spectra and tens to hundreds of thousands of mass-to-charge (m/z) images and can be as large as several gigabytes. Unsupervised analysis of an imaging MS data set aims at finding hidden structures in the data with no a priori information used and is often exploited as the first step of imaging MS data analysis. We propose a novel, easy-to-use and easy-to-implement approach to answer one of the key questions of unsupervised analysis of imaging MS data: what do all m/z images look like? The key idea of the approach is to cluster all m/z images according to their spatial similarity so that each cluster contains spatially similar m/z images. We propose a visualization of both spatial and spectral information obtained using clustering that provides an easy way to understand what all m/z images look like. We evaluated the proposed approach on matrix-assisted laser desorption ionization imaging MS data sets of a rat brain coronal section and human larynx carcinoma and discussed several scenarios of data analysis.