?
Applying statistical tagging to Russian poetry
НИУ ВШЭ
,
2018.
No. 76.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as one state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we discuss the method to compile the gold standard datasets for the Russian poetry. Secondly, we evaluate the taggers’ performance in the identification of the part of speech tags and lemmas. Finally, we analyze different types of errors in the taggers’ output. We analyse the confusion matrix of the parts of speech and mismatches in lemma annotation.
Language:
English
Keywords: natural language processingRussian languageRussian poetryNLP evaluationfull morphology tagging
Publication based on the results of:
Lyashevskaya O., Droganova K., Zeman D. et al., / НИУ ВШЭ. Series WP BRP "Linguistics". 2016. No. 44.
This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google. ...
Added: December 14, 2016
M. : Russian State University for the Humanitie, 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Linköping University Electronic Press, 2015
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) – NLP4CALL – is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. ...
Added: May 31, 2015
Stroudsburg, PA : Association for Computational Linguistics, 2017
This volume contains the papers presented at BSNLP-2017: the Sixth Workshop on Balto-Slavic Natural Language Processing. The Workshop is organized by SIGSLAV—Special Interest Group on NLP in Slavic Languages of the Association for Computational Linguistics.
The Workshops have been convening for over a decade, with a clear vision and purpose. On one hand, the languages from ...
Added: June 13, 2017
Суворова М. И., Кобозева М. В., Toldova S. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 17-26
В статье обсуждается важность автоматического сценарного анализа для понимания текстов на естественном языке. Дан широкий обзор методов и подходов к описанию и извлечению сценариев. Рассмотрены теоретические подходы к формализации сценариев. Приведен список задач, для решения которых используется информация о сценарной структуре текста. Представлены популярные подходы к автоматическому извлечению сценариев из текстов и методы оценки их ...
Added: April 22, 2020
Arkhangelskiy T., Гильмуллин Р. А., Невзорова О. А. et al., Научно-техническая информация. Серия 2: Информационные процессы и системы 2013
В статье описывается электронный корпус татарского языка, созданный в рамках программы фундаментальных исследований Президиума РАН "Корпусная лингвистика", и методы, использованные авторами для создания этого корпуса. В частности, описываются текстовый состав и жанровая структура корпуса, принятые авторами решения о выделении морфологических характеристик, автоматическая морфологическая разметка текстов с помощью двухуровневой модели морфологии и анализатора PC-KIMMO и размещение ...
Added: October 25, 2013
Osaka : [б.и.], 2016
Language resources are increasingly used not only in Language Technology (LT), but also in other subject fields, such as the digital humanities (DH) and in the field of education. Applying LT tools and data for such fields implies new perspectives on these resources regarding domain adaptation, interoperability, technical requirements, documentation, and usability of user interfaces. ...
Added: November 12, 2016
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Кашкин Е. В., Компьютерная лингвистика и интеллектуальные технологии 2015 Vol. 21 P. 427-440
The paper presents clustering experiments on Russian verbs based on the statistical data drawn from the Russian FrameBank (framebank.ru). While lexicology has essentially abandoned the idea of syntactic transformations as the primary basis for grouping verbs into semantic classes (Apresjan 1967, Levin 1993), the hypothesis of the same lexical and syntactic distributional profiles underlying lexical ...
Added: September 30, 2015
Skorinkin D.A., Budnikov E. A., Stepanova M. E. et al., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 721-733
This paper presents a rule-based approach to Information Extraction (IE) task within FactRuEval-2016 competition. Our system is based on ABBYY Compreno Technology. The technology uses the results of deep syntactic-semantic analysis, which leads to significant reduction of the number of necessary rules and makes them laconic. The evaluation was conducted on FactRuEval dataset. FactRuEval is ...
Added: August 28, 2016
Springer, 2014
This book constitutes the refereed proceedings of the 17th International Conference on Text, Speech and Dialogue, TSD 2013, held in Brno, Czech Republic, in September 2014. The 70 papers presented together with 3 invited papers were carefully reviewed and selected from 143 submissions. They focus on topics such as corpora and language resources; speech recognition; ...
Added: September 15, 2014
Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5-21
The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...
Added: December 11, 2019
Springer, 2015
16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I
ISBN: 978-3-319-18110-3 (Print) 978-3-319-18111-0 (Online) ...
Added: April 23, 2015
M. : Russian State University for the Humanitie, 2019
The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...
Added: June 12, 2019
Aachen : CEUR Workshop Proceedings, 2017
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With ...
Added: June 25, 2017
Klyshinskiy E., Жеребцова Ю., Чижик А., Системный администратор 2019 № 10 С. 82-91
Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long ...
Added: October 26, 2019
Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80-95
The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...
Added: April 15, 2015
Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440-454
The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...
Added: December 14, 2016
Pronoza E., Panicheva P., Koltsova O. et al., Information Processing and Management 2021 Vol. 58 No. 6 Article 102674
Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user ...
Added: September 2, 2021
СПб. : Издательство Санкт-Петербургского университета, 2019
Сборние содержит материалы докладов, представленных на Международной научной конференции "Корпусная лингвистика-2019" 24-28 июня 2019 г. в Санкт-Петербурге. ...
Added: July 8, 2019
Frolova N., Frolov E. S., The Kazakh-American Free University Academic Journal 2017 P. 179-184
The article reflects the practical experience of enhancing the process of Academic English Writing teaching to undergraduate students by means of web tools. Along with theoretical analysis of the integration scheme of blended learning into the curriculum the article features empirical survey to confirm the efficiency of the project. The article contains a detailed description ...
Added: June 5, 2018
Холенштайн Э., Логос 2007 Т. 63 № 6 С. 176-196
В статье предпринимается попытка применить феноменологическуюй психологию Эдмунда Гуссерля к разрешению проблемы соотношения естественного и искусственного интеллекта. ...
Added: July 7, 2015
Switzerland : Springer, 2019
This volume contains a collection of submitted papers presented at the conference,
which were thoroughly reviewed by members of the Program Committee consisting of
more than 100 top specialists, as well as an invited paper by Prof. Scharenborg. Each
paper was reviewed, single blind, by two to four committee members (three reviewers
on the average) and then discussed by ...
Added: October 29, 2019
CEUR-WS.org, 2020
Added: October 7, 2020