Universal Dependencies for Russian: A New Syntactic Dependencies Tagset

O. Lyashevskaya; Droganova K.; Zeman D.; M. Alexeeva; N. S. Mustafina

?

Universal Dependencies for Russian: A New Syntactic Dependencies Tagset

НИУ ВШЭ , 2016. No. 44.

Lyashevskaya O., Droganova K., Zeman D., Alexeeva M., Гаврилова Т. С., Mustafina N. S., Шакурова Е. И.

This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google.

Research target: Computer Science Philology and Linguistics

Priority areas: humanitarian IT and mathematics

Language: English

Keywords: русский язык синтаксический анализ natural language processing Russian language автоматическая обработка естественного языка грамматика зависимостей dependency parsing universal dependencies универсальные зависимости парсинг зависимостей

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)

M.: Russian State University for the Humanitie, 2019

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Корпус татарского языка "Туган тел"

Arkhangelskiy T., Гильмуллин Р. А., Невзорова О. А. et al., Научно-техническая информация. Серия 2: Информационные процессы и системы 2013

В статье описывается электронный корпус татарского языка, созданный в рамках программы фундаментальных исследований Президиума РАН "Корпусная лингвистика", и методы, использованные авторами для создания этого корпуса. В частности, описываются текстовый состав и жанровая структура корпуса, принятые авторами решения о выделении морфологических характеристик, автоматическая морфологическая разметка текстов с помощью двухуровневой модели морфологии и анализатора PC-KIMMO и размещение ...

Added: October 25, 2013

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019

Added: October 16, 2019

Applying statistical tagging to Russian poetry

Starchenko A., Kazakevich L., Lyashevskaya O., / НИУ ВШЭ. Series WP BRP "Linguistics". 2018. No. 76.

The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic ...

Added: December 12, 2018

Квантитативные методы в диахронических корпусных исследованиях: конструкции с предикативами и дативным субъектом

Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80–95

The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...

Added: April 15, 2015

Inducing verb classes from frames in Russian: morpho-syntax and semantic roles

Кашкин Е. В., Компьютерная лингвистика и интеллектуальные технологии 2015 Vol. 21 P. 427–440

The paper presents clustering experiments on Russian verbs based on the statistical data drawn from the Russian FrameBank (framebank.ru). While lexicology has essentially abandoned the idea of syntactic transformations as the primary basis for grouping verbs into semantic classes (Apresjan 1967, Levin 1993), the hypothesis of the same lexical and syntactic distributional profiles underlying lexical ...

Added: September 30, 2015

Извлечение сценарной информации из текстов. Часть 1. Постановка задачи и обзор методов

Суворова М. И., Кобозева М. В., Toldova S. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 17–26

В статье обсуждается важность автоматического сценарного анализа для понимания текстов на естественном языке. Дан широкий обзор методов и подходов к описанию и извлечению сценариев. Рассмотрены теоретические подходы к формализации сценариев. Приведен список задач, для решения которых используется информация о сценарной структуре текста. Представлены популярные подходы к автоматическому извлечению сценариев из текстов и методы оценки их ...

Added: April 22, 2020

Welcome to the club: Designing the inventory of semantic roles for adjectives

Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440–454

The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...

Added: December 14, 2016

Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

Osaka: [б.и.], 2016

Language resources are increasingly used not only in Language Technology (LT), but also in other subject fields, such as the digital humanities (DH) and in the field of education. Applying LT tools and data for such fields implies new perspectives on these resources regarding domain adaptation, interoperability, technical requirements, documentation, and usability of user interfaces. ...

Added: November 12, 2016

Text collections for evaluation of Russian morphological taggers

Lyashevskaya O., Bocharov V., Sorokin A. et al., Jazykovedny Casopis 2017 Vol. 68 No. 2 P. 258–267

The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single ...

Added: January 30, 2018

8th Russian Summer School in Information Retrieval (RuSSIR 2014)

Braslavski P., Karpov Nikolay, Worring M. et al., ACM SIGIR Forum 2014 Vol. 48 No. 2 P. 105–110

The 8th Russian Summer School in Information Retrieval (RuSSIR 2014) was held on August 18-22, 2014 in Nizhniy Novgorod, Russia.1 The school was co-organized by the National Research University Higher School of Economics2 and the Russian Information Retrieval Evaluation Seminar (ROMIP) ...

Added: August 22, 2015

Detecting ethnicity-targeted hate speech in Russian social media texts

Pronoza E., Panicheva P., Koltsova O. et al., Information Processing and Management 2021 Vol. 58 No. 6 Article 102674

Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user ...

Added: September 2, 2021

Towards to Automatic Text Adaptation in Russian Language

Karpov N., Sibirtseva V., / НИУ ВШЭ. Series WP BRP "Linguistics". 2014.

This article describes ways to use original texts in the National Russian Corpus as well as news texts for teaching Russian as a foreign language. Two-year work of a scientific group of Higher School of Economics (Nizhny Novgorod-Moscow), which is called CorpLings is analyzed. Special attention is paid to the basic principles of research part of the project ...

Added: December 10, 2014

Proceedings of the 4th workshop on NLP for Computer Assisted Language Learning at NODALIDA 2015, Vilnius, 11th May, 2015

Linköping University Electronic Press, 2015

The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) – NLP4CALL – is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. ...

Added: May 31, 2015

Quantitative approaches to the Russian language

Abingdon: Routledge, 2018

This edited collection presents a range of methods that can be used to analyse linguistic data quantitatively. A series of case studies of Russian data spanning different aspects of modern linguistics serve as the basis for a discussion of methodological and theoretical issues in linguistic data analysis. The book presents current trends in quantitative linguistics, ...

Added: October 11, 2016

Language Exercise Generation: Emulating Cambridge Open Cloze

Malafeev A., International Journal of Conceptual Structures and Smart Applications (IJCSSA) 2014 Vol. 2 No. 2 P. 20–35

This article presents an approach to the automatic generation of open cloze exercises based on arbitrary English text. The exercise format is similar to the open cloze test used in Cambridge English certificate exams (FCE, CAE, CPE). The presented method also makes it possible to adjust the difficulty of the resulting exercises to better suit ...

Added: November 29, 2014

Проблемы обработки естественного языка в диалоговых системах

Klyshinskiy E., Жеребцова Ю., Чижик А., Системный администратор 2019 № 10 С. 82–91

Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long ...

Added: October 26, 2019

Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

Berlin: Association for Computational Linguistics, 2016

The 2016 Conference on Computational Natural Language Learning is the twentieth in the series of annual meetings organized by SIGNLL, the ACL special interest group on natural language learning. CoNLL 2016 will be held on August 11-12, 2016, and is co-located with the 54th annual meeting of the Association for Computational Linguistics (ACL) in Berlin, ...

Added: November 12, 2016

Development of modern electronic textbook of Russian as a foreign language: content and technology

Sibirtseva V., Karpov N., / Издательский дом НИУ ВШЭ. Series WP "Working Papers of Humanities". 2012. No. 2012-6.

The paper considers the features of selecting the teaching illustrative material for the theoretical part of a multimedia textbook on Russian as a foreign language, and describes the peculiarities of compiling a set of exercises on the basis of the National Corpus of the Russian Language. The author(s) analysed in detail the difficulties caused by ...

Added: November 8, 2012

CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016

Aachen: CEUR Workshop Proceedings, 2017

As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With ...

Added: June 25, 2017

Computational Linguistics and Intellectual Technologies

M.: Russian State University for the Humanitie, 2019

The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...

Added: June 12, 2019

Количественная оценка грамматической неоднозначности некоторых европейских языков

Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5–21

The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...

Added: December 11, 2019

Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science

Springer, 2015

16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I ISBN: 978-3-319-18110-3 (Print) 978-3-319-18111-0 (Online) ...

Added: April 23, 2015

TALN-RECITAL 2014 Workshop TALAf 2014 : Traitement Automatique des Langues Africaines (TALAf 2014: African Language Processing)

Marseille: Association pour le Traitement Automatique des Langues, 2014

Dans la suite du premier atelier TALAf qui s'est tenu le 8 juin 2012 à Grenoble, lors de la conférence JEP-TALN-RECITAL 2012 (voir les actes : http://aclweb.org/anthology//W/W12/#1300), nous proposons une nouvelle édition de cet atelier lors de la conférence TALN 2014 le premier juillet à Marseille. Cette deuxième édition montre l'intérêt d'un atelier francophone sur le traitement ...

Added: March 26, 2015