### Article

## Frequent Itemset Mining for Clustering Near Duplicate Web Documents

A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.

This state-of-the-art survey is dedicated to the memory of Emmanuil Markovich Braverman (1931-1977), a pioneer in developing the machine learning theory. The 12 revised full papers and 4 short papers included in this volume were presented at the conference "Braverman Readings in Machine Learning: Key Ideas from Inception to Current State" held in Boston, MA, USA, in April 2017, commemorating the 40th anniversary of Emmanuil Braverman's decease. The papers present an overview of some of Braverman's ideas and approaches. The collection is divided in three parts. The first part bridges the past and the present. Its main contents relate to the concept of kernel function and its application to signal and image analysis as well as clustering. The second part presents a set of extensions of Braverman's work to issues of current interest both in theory and applications of machine learning. The third part includes short essays by a friend, a student, and a colleague.

The existing findings on the relationship between optimism and academic performance are rather contradictory. Two studies were undertaken to investigate thе relationship between attributional style, well-being, and academic performance. A new Russian-language measure of attributional style for positive and negative events (Gordeeva, Osin, Shevyakhova, 2009) with stability, globality, and controllability subscales was used. In the first study, optimistic attributional style for good events was associated with higher academic achievement in high school students (N=225) and mediated the effect of academic performance on self-esteem. In the second study, pessimistic attributional style for negative events predicted success in passing three difficult written entrance examinations in university entrants (N=108), and optimistic attributional style for good events predicted success with success expectations as a mediator. The results indicate that attributional styles for positive and negative events are not uniform in their relationship to performance in different academic settings and to well-being variables.

The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study.

This book constitutes the proceedings of the 23rd International Symposium on Foundations of Intelligent Systems, ISMIS 2017, held in Warsaw, Poland, in June 2017. The 56 regular and 15 short papers presented in this volume were carefully reviewed and selected from 118 submissions. The papers include both theoretical and practical aspects of machine learning, data mining methods, deep learning, bioinformatics and health informatics, intelligent information systems, knowledge-based systems, mining temporal, spatial and spatio-temporal data, text and Web mining. In addition, four special sessions were organized; namely, Special Session on Big Data Analytics and Stream Data Mining, Special Session on Granular and Soft Clustering for Data Science, Special Session on Knowledge Discovery with Formal Concept Analysis and Related Formalisms, and Special Session devoted to ISMIS 2017 Data Mining Competition on Trading Based on Recommendations, which was launched as a part of the conference.

Imaging mass spectrometry (imaging MS) has emerged in the past decade as a label-free, spatially resolved, and multipurpose bioanalytical technique for direct analysis of biological samples from animal tissue, plant tissue, biofilms, and polymer films. Imaging MS has been successfully incorporated into many biomedical pipelines where it is usually applied in the so-called untargeted mode-capturing spatial localization of a multitude of ions from a wide mass range. An imaging MS data set usually comprises thousands of spectra and tens to hundreds of thousands of mass-to-charge (m/z) images and can be as large as several gigabytes. Unsupervised analysis of an imaging MS data set aims at finding hidden structures in the data with no a priori information used and is often exploited as the first step of imaging MS data analysis. We propose a novel, easy-to-use and easy-to-implement approach to answer one of the key questions of unsupervised analysis of imaging MS data: what do all m/z images look like? The key idea of the approach is to cluster all m/z images according to their spatial similarity so that each cluster contains spatially similar m/z images. We propose a visualization of both spatial and spectral information obtained using clustering that provides an easy way to understand what all m/z images look like. We evaluated the proposed approach on matrix-assisted laser desorption ionization imaging MS data sets of a rat brain coronal section and human larynx carcinoma and discussed several scenarios of data analysis.

This proceedings publication is a compilation of selected contributions from the “Third International Conference on the Dynamics of Information Systems” which took place at the University of Florida, Gainesville, February 16–18, 2011. The purpose of this conference was to bring together scientists and engineers from industry, government, and academia in order to exchange new discoveries and results in a broad range of topics relevant to the theory and practice of dynamics of information systems. Dynamics of Information Systems: Mathematical Foundation presents state-of-the art research and is intended for graduate students and researchers interested in some of the most recent discoveries in information theory and dynamical systems. Scientists in other disciplines may also benefit from the applications of new developments to their own area of study.

This article represents a new technique for collaborative filtering based on pre-clustering of website usage data. The key idea involves using clustering methods to define groups of different users.

This is a textbook in data analysis. Its contents are heavily influenced by the idea that data analysis should help in enhancing and augmenting knowledge of the domain as represented by the concepts and statements of relation between them. According to this view, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. Visualization, in this context, is a way of presenting results in a cognitively comfortable way. The term *summarization* is understood quite broadly here to embrace not only simple summaries like totals and means, but also more complex summaries such as the principal components of a set of features or cluster structures in a set of entities.

The material presented in this perspective makes a unique mix of subjects from the fields of statistical data analysis, data mining, and computational intelligence, which follow different systems of presentation.

A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.

Let k be a field of characteristic zero, let G be a connected reductive algebraic group over k and let g be its Lie algebra. Let k(G), respectively, k(g), be the field of k- rational functions on G, respectively, g. The conjugation action of G on itself induces the adjoint action of G on g. We investigate the question whether or not the field extensions k(G)/k(G)^G and k(g)/k(g)^G are purely transcendental. We show that the answer is the same for k(G)/k(G)^G and k(g)/k(g)^G, and reduce the problem to the case where G is simple. For simple groups we show that the answer is positive if G is split of type A_n or C_n, and negative for groups of other types, except possibly G_2. A key ingredient in the proof of the negative result is a recent formula for the unramified Brauer group of a homogeneous space with connected stabilizers. As a byproduct of our investigation we give an affirmative answer to a question of Grothendieck about the existence of a rational section of the categorical quotient morphism for the conjugating action of G on itself.

Let G be a connected semisimple algebraic group over an algebraically closed field k. In 1965 Steinberg proved that if G is simply connected, then in G there exists a closed irreducible cross-section of the set of closures of regular conjugacy classes. We prove that in arbitrary G such a cross-section exists if and only if the universal covering isogeny Ĝ → G is bijective; this answers Grothendieck's question cited in the epigraph. In particular, for char k = 0, the converse to Steinberg's theorem holds. The existence of a cross-section in G implies, at least for char k = 0, that the algebra k[G]G of class functions on G is generated by rk G elements. We describe, for arbitrary G, a minimal generating set of k[G]G and that of the representation ring of G and answer two Grothendieck's questions on constructing generating sets of k[G]G. We prove the existence of a rational (i.e., local) section of the quotient morphism for arbitrary G and the existence of a rational cross-section in G (for char k = 0, this has been proved earlier); this answers the other question cited in the epigraph. We also prove that the existence of a rational section is equivalent to the existence of a rational W-equivariant map T- - - >G/T where T is a maximal torus of G and W the Weyl group.