Decomposing Textual Information For Style Transfer
This paper focuses on latent representations that could effectively decompose different aspects of textual information. Using a framework of style transfer for texts, we propose several empirical methods to assess information decomposition quality. We validate these methods with several state-of-the-art textual style transfer methods. Higher quality of information decomposition corresponds to higher performance in terms of bilingual evaluation understudy (BLEU) between output and human-written reformulations.
Artificial General Intelligence (AGI) is showing growing performance in numerous applications - beating human performance in Chess and Go, using knowledge bases and text sources to answer questions (SQuAD) and even pass human examination (Aristo project). In this paper, we describe the results of AI Journey, a competition of AI-systems aimed to improve AI performance on knowledge bases, reasoning and text generation. Competing systems pass the final native language exam (in Russian), including versatile grammar tasks (test and open questions) and an essay, achieving a high score of 69%, with 68% being an average human result. During the competition, a baseline for the task and essay parts was proposed, and 80+ systems were submitted, showing different approaches to task understanding and reasoning. All the data and solutions can be found on github https://github.com/sberbank-ai/combined_solution_aij2019
Welcome to the 12th edition of LREC . . . that should have been in Marseille, first time in France! Unfortunately not now, in May 2020. Now my welcome is completely virtual, to all of you authors of these Proceedings papers and to the colleagues who will look at these. Virtual but not less sincere. This LREC would have also been an occasion to celebrate the 25th anniversary of ELRA. We are proud that ELRA is becoming a mature association. And LREC too. LREC started in 1998, 22 years ago. We hope to welcome you in a non-virtual way next year in Marseille. We will enjoy together not only the conference but also the special “light” of Marseille and the wonderful view of the Mediterranean and the city from the Palais du Pharo.
The paper presents the results of GramEval 2020, a shared task on Russian morphological and syntactic processing. The objective is to process Russian texts starting from provided tokens to parts of speech (pos), grammatical features, lemmas, and labeled dependency trees. To encourage the multi-domain processing, five genres of Modern Russian are selected as test data: news, social media and electronic communication, wiki-texts, fiction, poetry; Middle Russian texts are used as the sixth test set. The data annotation follows the Universal Dependencies scheme. Unlike in many similar tasks, the collection of existing resources, the annotation of which is not perfectly harmonized, is provided for training, so the variability in annotations is a further source of difficulties. The main metric is the average accuracy of pos, features, and lemma tagging, and LAS. In this report, the organizers of GramEval 2020 overview the task, training and test data, evaluation methodology, submission routine, and participating systems. The approaches proposed by the participating systems and their results are reported and analyzed
This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016.
The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life and social sciences; morphological and technological approaches to image analysis.
In the paper the case is studied then semiotic signs can be represented as language constructs in the same language as the text for the interpretation. The goal is to obtain estimates of the depth of interpretability with the respect to each of the signs by finding the projections of the narrative on these language constructs. That is, we find the probability of the presence of each of the signs. By use the embedding of linguistic constructions into a multidimensional numerical vector space and the grid layout, which is the image of the system of signs. At the stage of the inference, our model computes the nodes of the grid, which are the most adequate to the input text by the method of new proposed asymmetric similarity. The constructed model was studied using Russian criminal, civil and labour legislation. We suggest using the models called the constellation as for the grid of normative acts and the narrative together with the corresponding divergence. In this paper, exanimated the pre-trained models known in NLP as doc2vec and Fast Text. The evaluation of the quality of models was carried out using open databases of court decisions.
This paper addresses the problem of stylized text generation in a multilingual setup. A version of a language model based on a long short-term memory (LSTM) artificial neural network with extended phonetic and semantic embeddings is used for stylized poetry generation. The quality of the resulting poems generated by the network is estimated through bilingual evaluation understudy (BLEU), a survey and a new cross-entropy based metric that is suggested for the problems of such type. The experiments show that the proposed model consistently outperforms random sample and vanilla-LSTM baselines, humans also tend to associate machine generated texts with the target author.