Использование общедоступной текстовой информации при моделировании коммерческого успеха кинофильмов
This volume contains proceedings of the fourth conference on Analysis of Images, Social Networks and Texts (AIST’2015)1 . The first three conferences in 2012–2014 attracted a significant number of students, researchers, academics and engineers working on interdisciplinary data analysis of images, texts, and social networks. The broad scope of AIST makes it an event where researchers from different domains, such as image and text processing, exploiting various data analysis techniques, can meet and exchange ideas. We strongly believe that this may lead to crossfertilisation of ideas between researchers relying on modern data analysis machinery. Therefore, AIST brings together all kinds of applications of data mining and machine learning techniques. The conference allows specialists from different fields to meet each other, present their work, and discuss both theoretical and practical aspects of their data analysis problems. Another important aim of the conference is to stimulate scientists and people from the industry to benefit from the knowledge exchange and identify possible grounds for fruitful collaboration. The conference was held during April 9–11, 2015. Following an already established tradition, the conference was organised in Yekaterinburg, a cross-roads between European and Asian parts of Russia, the capital of Urals region.The key topics of AIST are analysis of images and videos; natural language processing and computational linguistics; social network analysis; pattern recognition, machine learning and data mining; recommender systems and collaborative technologies; semantic web, ontologies and their applications. The Program Committee and the reviewers of the conference included wellknown experts in data mining and machine learning, natural language processing, image processing, social network analysis, and related areas from leading institutions of 22 countries including Australia, Bangladesh, Belgium, Brazil, Cyprus, Egypt, Finland, France, Germany, Greece, India, Ireland, Italy, Luxembourg, Poland, Qatar, Russia, Spain, The Netherlands, UK, USA and Ukraine.
Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document’s timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document.
The paper explores a suitability of higher education quality measurement from student's point of view, and analyses results of interviewing of students from engineering specialties in Perm universities. Nonlinear Principal Components Analysis (NLPCA) in interpretation of Gifi system was used as the tool for data processing. It takes into account a dissimilar statistical nature of questionnaire indicators. The method can be very promising for various socio-economic researches.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.