Mapping the Radical Innovations in Food Industry: A Text Mining Study
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
Future-oriented technology analysis methods can play a significant role in enabling early warning signal detection and pro-active policy action which will help to better prepare policy- and decision-makers in today’s complex and inter-dependent environments. This paper analyses the use of different horizon scanning approaches and methods as applied in the Scanning for Emerging Science and Technology Issues project. A comparative analysis is provided as well as a brief evaluation the needs of policy-makers if they are to identify areas in which policy needs to be formulated. This paper suggests that the selection of the best scanning approaches and methods is subject to contextual and content issues. At the same time, there are certain issues which characterise horizon scanning processes, methods and results that should be kept in mind by both practitioners and policy-makers.
The practical relevance of process mining is increasing as more and more event data become available. Process mining techniques aim to discover, monitor and improve real processes by extracting knowledge from event logs. The two most prominent process mining tasks are: (i) process discovery: learning a process model from example behavior recorded in an event log, and (ii) conformance checking: diagnosing and quantifying discrepancies between observed behavior and modeled behavior. The increasing volume of event data provides both opportunities and challenges for process mining. Existing process mining techniques have problems dealing with large event logs referring to many different activities. Therefore, we propose a generic approach to decompose process mining problems. The decomposition approach is generic and can be combined with different existing process discovery and conformance checking techniques. It is possible to split computationally challenging process mining problems into many smaller problems that can be analyzed easily and whose results can be combined into solutions for the original problems.
Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document’s timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document.
In 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the momentIn 2015-2016 the Department of Communication, Media and Design of the National Research University “Higher School of Economics” in collaboration with non-profit organization ROCIT conducted research aimed to construct the Index of Digital Literacy in Russian Regions. This research was the priority and remain unmatched for the moment
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
The article is dedicated to the analysis of Big Data perspective in jurisprudence. It is proved that Big Data have to be used as the explanatory and predictable tool. The author describes issues concerning Big Data application in legal research. The problems are technical (data access, technical imperfections, data verification) and informative (interpretation of data and correlations). It is concluded that there is the necessity to enhance Big Data investigations taking into account the abovementioned limits.
The paper examines the structure, governance, and balance sheets of state-controlled banks in Russia, which accounted for over 55 percent of the total assets in the country's banking system in early 2012. The author offers a credible estimate of the size of the country's state banking sector by including banks that are indirectly owned by public organizations. Contrary to some predictions based on the theoretical literature on economic transition, he explains the relatively high profitability and efficiency of Russian state-controlled banks by pointing to their competitive position in such functions as acquisition and disposal of assets on behalf of the government. Also suggested in the paper is a different way of looking at market concentration in Russia (by consolidating the market shares of core state-controlled banks), which produces a picture of a more concentrated market than officially reported. Lastly, one of the author's interesting conclusions is that China provides a better benchmark than the formerly centrally planned economies of Central and Eastern Europe by which to assess the viability of state ownership of banks in Russia and to evaluate the country's banking sector.
The results of cross-cultural research of implicit theories of innovativeness among students and teachers, representatives of three ethnocultural groups: Russians, the people of the North Caucasus (Chechens and Ingushs) and Tuvinians (N=804) are presented. Intergroup differences in implicit theories of innovativeness are revealed: the ‘individual’ theories of innovativeness prevail among Russians and among the students, the ‘social’ theories of innovativeness are more expressed among respondents from the North Caucasus, Tuva and among the teachers. Using the structural equations modeling the universal model of values impact on implicit theories of innovativeness and attitudes towards innovations is constructed. Values of the Openness to changes and individual theories of innovativeness promote the positive relation to innovations. Results of research have shown that implicit theories of innovativeness differ in different cultures, and values make different impact on the attitudes towards innovations and innovative experience in different cultures.
The paper examines the principles for the supervision of financial conglomerates proposed by BCBS in the consultative document published in December 2011. Moreover, the article proposes a number of suggestions worked out by the authors within the HSE research team.