Количественная оценка грамматической неоднозначности некоторых европейских языков
The paper presents work on automatic Arabic dialect classification and proposes machine learning classification method where training dataset consists of two corpora. The first one is a small corpus of manually dialectannotated instances. The second one contains big amount of instances that were grabbed from the Web automatically using word-marks—most unique and frequent dialectal words used as dialect identifiers. In the paper we considered four dialects that are mostly used by Arabic people: Levantine, Egyptian, Saudi and Iraq. The most important benefit of that approach is the fact that it reduces time expenses on manual annotation of data from social media, because the accent is made on the corpus created automatically. Best results that we got were achieved with Naïve Bayes classifier trained using character-based bigrams, trigrams and word-marks vocabulary: precision of classification reaches 0.92 with F1 -measure equal to 0.91 on the test set of instances taken from manually annotated corpus.
The paper presents a short summary on the applications of the quantum logic categorical constructions to the natural language processing. We give a brief overview on the topic of quantum logic in general, and in natural language processing, in particular. As a result, we discuss comparison of sentences and their representation in quantum logic formalism. The examples of using quantum diagrams are considered in order to understand text analysis in terms of quantum logic techniques.
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With hopes to facilitate this interaction between theory and practical implementation, we are pleased to announce the workshop on Computational Linguistics and Language Science to be held in Moscow, Russia on April 25, 2016 (11 AM to 6 PM).
Concept discovery is a subdomain of Knowledge Discovery (KDD) that uses
human-centered techniques such as Formal Concept Analysis (FCA), Topic Mod-
eling, Visual Text Representations, Conceptual Graphs etc. for gaining insight
into the underlying conceptual structure of the data. Traditional machine learn-
ing techniques are mainly focusing on structured data whereas most data avail-
able resides in unstructured, often textual, form. Compared to traditional data
mining techniques, human-centered instruments actively engage the domain ex-
pert in the discovery process.
This volume contains the papers presented at the 3rd International Workshop
on Concept Discovery in Unstructured Data (CDUD 2016) held on July 18,
2018 at the National Research University Higher School of Economics, Moscow,
Russia. This workshop welcomes papers describing innovative research on data
discovery in complex data. It particular, it provides a forum for researchers and
developers of text mining instruments, whose research is related to the analysis
of linguistic and text data.
This year 15 papers had been submitted. Each submission has been reviewed,
at least, by 2 program committee members. Seven papers have been accepted
for regular publication in the proceedings, and three more submissions for pub-
lication as project proposals or abstracts.
Papers included in this volume cover a wide range of topics related to text
mining and structures for text representation: text navigation, statistical learning
models, automatic author or field identification in texts, among others.
An invited talk given by Natalia Loukachevitch from Moscow State Univer-
sity has opened the workshop program. She has surveyed modern tasks and
approaches in sentiment analysis of Twitter messages.
Our deep gratitude goes to all the authors of submitted papers, as well as
to the Program Committee members for their commitment. We also would like
to thank our invited speaker and our sponsors: National Research University
Higher School of Economics (Moscow, Russia), Russian Foundation for Basic
Research, and ExactPro. Finally, we would like to acknowledge the EasyChair
system which helped us to manage the reviewing process.
Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long and free social chats that occur naturally in social human interactions. In order to better understand the user’s expression, and then feedback the correct information, natural language understanding plays an extremely important role. Despite progress made in solving NLP problems, it remains very challenging today in the field of dialogue systems. In this paper, we review the recent progress in developing dialogue systems, its current architecture features and further prospects. We focus on the natural language understanding tasks which are key for building a good conversational agent, and than we are summarizing NLP methods and frameworks, in order to allow researchers to study the potential improvements of the state-of-the-art dialogue systems. Additionally, we consider the dialogue concept in context of human-machine interaction, and briefly describe dialogue evaluation metrics.
This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.
In this paper we consider choice problems under the assumption that the preferences of the decision maker are expressed in the form of a parametric partial weak order without assuming the existence of any value function. We investigate both the sensitivity (stability) of each non-dominated solution with respect to the changes of parameters of this order, and the sensitivity of the set of non-dominated solutions as a whole to similar changes. We show that this type of sensitivity analysis can be performed by employing techniques of linear programming.
I give the explicit formula for the (set-theoretical) system of Resultants of m+1 homogeneous polynomials in n+1 variables