Learning Prefix-Based Patterns from Demographic Sequences
There are many different methods for computing relevant
patterns in sequential data and interpreting the results. In this paper,
we compute emerging patterns (EP) in demographic sequences using
sequence-based pattern structures, along with different algorithmic so-
lutions. The purpose of this method is to meet the following domain
requirement: the obtained patterns must be (closed) frequent contiguous
prefixes of the input sequences. This is required in order for demogra-
phers to fully understand and interpret the results.
The paper makes a brief introduction into multiple classifier systems and describes a particular algorithm which improves classification accuracy by making a recommendation of an algorithm to an object. This recommendation is done under a hypothesis that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier to each object involves here the apparatus of Formal Concept Analysis. We explain the principle of the algorithm on a toy example and describe experiments with real-world datasets.
The volume contains the abstracts of the 12th International Conference "Intelligent Data Processing: Theory and Applications". The conference is organized by the Russian Academy of Sciences, the Federal Research Center "Informatics and Control" of the Russian Academy of Sciences and the Scientific and Coordination Center "Digital Methods of Data Mining". The conference has being held biennially since 1989. It is one of the most recognizable scientific forums on data mining, machine learning, pattern recognition, image analysis, signal processing, and discrete analysis. The Organizing Committee of IDP-2018 is grateful to Forecsys Co. and CFRS Co. for providing assistance in the conference preparation and execution. The conference is funded by RFBR, grant 18-07-20075. The conference website http://mmro.ru/en/.
Pattern structures, an extension of FCA to data with complex descriptions, propose an alternative to conceptual scaling (binarization) by giving direct way to knowledge discovery in complex data such as logical formulas, graphs, strings, tuples of numerical intervals, etc. Whereas the approach to classification with pattern structures based on preceding generation of classifiers can lead to double exponent complexity, the combination of lazy evaluation with projection approximations of initial data, randomization and parallelization, results in reduction of algorithmic complexity to low degree polynomial, and thus is feasible for big data.
A scalable method for mining graph patterns stable under subsampling is proposed. The existing subsample stability and robustness measures are not antimonotonic according to definitions known so far. We study a broader notion of antimonotonicity for graph patterns, so that measures of subsample stability become antimonotonic. Then we propose gSOFIA for mining the most subsample-stable graph patterns. The experiments on numerous graph datasets show that gSOFIA is very efficient for discovering subsample-stable graph patterns.
In an effort to make reading more accessible, an automated readability formula can help students to retrieve appropriate material for their language level. This study attempts to discover and analyze a set of possible features that can be used for single-sentence readability prediction in Russian. We test the influence of syntactic features on predictability of structural complexity. The readability of sentences from SynTagRus corpus was marked up manually and used for evaluation.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
In 2006, Russia amended its competition law and added the concepts of ‘collective dominance’ and its abuse. This was seen as an attempt to address the common problem of ‘conscious parallelism’ among firms in concentrated industries. Critics feared that the enforcement of this provision would become tantamount to government regulation of prices. In this paper we examine the enforcement experience to date, looking especially closely at sanctions imposed on firms in the oil industry. Some difficulties and complications experienced in enforcement are analysed, and some alternative strategies for addressing anticompetitive behaviour in concentrated industries discussed.
This article is talking about state management and cultural policy, their nature and content in term of the new tendency - development of postindustrial society. It mentioned here, that at the moment cultural policy is the base of regional political activity and that regions can get strong competitive advantage if they are able to implement cultural policy successfully. All these trends can produce elements of new economic development.