?
Автоматическое определение частей речи для русского языка с помощью обучения трансформаций.
Научные труды Вольного экономического общества России. 2014. Т. 186. С. 228-235.
Kitov V. V.
In press
This paper describes the application of well-known «transformation-based learning» algorithm of automatic rule generation for the task of part-of-speech tagging. Algorithm is applied to corpora of annotated Russian texts and accuracy as well as most significant rules are shown.
Priority areas:
IT and mathematics
Language:
Russian
M. : Russian State University for the Humanitie, 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80-95
The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...
Added: April 15, 2015
Karpov N., Sibirtseva V., / НИУ ВШЭ. Series WP BRP "Linguistics". 2014.
This article describes ways to use original texts in the National Russian Corpus as well as
news texts for teaching Russian as a foreign language. Two-year work of a scientific group of
Higher School of Economics (Nizhny Novgorod-Moscow), which is called CorpLings is
analyzed. Special attention is paid to the basic principles of research part of the project ...
Added: December 10, 2014
Logvinova N., Russian linguistics 2024 Vol. 48 No. 1 P. 0
The paper discusses case concord in Russian appositional constructions, which manifests itself in optional case concord of the proper name (v rek-eLOC Don-eLOC/ v rek-eLOC DonNOM ‘in the river Don’). The study provides an in-depth corpus analysis of more than 15,000 examples, using a logistic regression statistical model to predict the choice between presence and ...
Added: March 17, 2024
Piperski A., Grabovskaya M., Gridneva E. et al., / НИУ ВШЭ. Series WP BRP "Linguistics". 2019. No. 92.
In Russian, there are many ways to address a person by name. For instance, a man called Aleksandr may be addressed as Aleksandr, Aleksandr Ivanovič, Saša, Sašen′ka, Saška, Sanja, etc. This study aims at analyzing the use of various strategies of naming the listener throughout the last two centuries. It uses the data from the ...
Added: December 15, 2019
Lyashevskaya O., М. : Языки славянской культуры, 2016
Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...
Added: March 26, 2015
Sherstinova T., Петрова И. А., Социо- и психолингвистические исследования 2023
To effectively model contemporary speech processes within daily communication, comprehensive linguistic resources, such as the ORD corpus, are indispensable. This paper introduces a novel resource which was being developed using a continuous audio recording methodology capturing informant's verbal behaviors – youth oral speech corpus named ESC (Everyday Student Conversations) The primary objective behind this corpus' ...
Added: December 10, 2023
СПб. : Издательство Нестор-История, 2018
The volume is the third issue of a corpora-based grammar of Russian. The volume deals with the issues of parts of speech and, more generally, with formal classes of lexicon, It comprises descriptive papers of separate POS and lesser world classes. ...
Added: November 4, 2018
Кашкин Е. В., Компьютерная лингвистика и интеллектуальные технологии 2015 Vol. 21 P. 427-440
The paper presents clustering experiments on Russian verbs based on the statistical data drawn from the Russian FrameBank (framebank.ru). While lexicology has essentially abandoned the idea of syntactic transformations as the primary basis for grouping verbs into semantic classes (Apresjan 1967, Levin 1993), the hypothesis of the same lexical and syntactic distributional profiles underlying lexical ...
Added: September 30, 2015
Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440-454
The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...
Added: December 14, 2016
Издательство Санкт-Петербургского государственного университета, 2019
Сборник содержит материалы докладов, представленных на Международной научной конференции «Корпусная лингвистика-2019» 24–28 июня 2019 г. в Санкт-Петербурге. Создание корпусов текстов является одним из приоритетных направлений в современной лингвистике. Проведение конференции по данной тематике знакомит ученых с современными разработками и новыми технологическими решениями в этой области, а также способствует обобщению опыта научных исследований по корпусной лингвистике. ...
Added: November 1, 2020
Bogdanova-Beglarian N., Sherstinova T., Blinova O. et al., Lecture Notes in Computer Science 2016 Vol. 9811 P. 100-107
The research presented in this paper has been conducted in the framework of the large sociolinguistic project aimed at describing everyday spoken Russian and analyzing the special characteristics of its usage by different social groups of speakers. The research is based on the material of the ORD corpus containing long-term audio recordings of everyday communication. ...
Added: December 31, 2017
Arkhangelskiy T., Гильмуллин Р. А., Невзорова О. А. et al., Научно-техническая информация. Серия 2: Информационные процессы и системы 2013
В статье описывается электронный корпус татарского языка, созданный в рамках программы фундаментальных исследований Президиума РАН "Корпусная лингвистика", и методы, использованные авторами для создания этого корпуса. В частности, описываются текстовый состав и жанровая структура корпуса, принятые авторами решения о выделении морфологических характеристик, автоматическая морфологическая разметка текстов с помощью двухуровневой модели морфологии и анализатора PC-KIMMO и размещение ...
Added: October 25, 2013
Abingdon : Routledge, 2018
This edited collection presents a range of methods that can be used to analyse linguistic data quantitatively. A series of case studies of Russian data spanning different aspects of modern linguistics serve as the basis for a discussion of methodological and theoretical issues in linguistic data analysis. The book presents current trends in quantitative linguistics, ...
Added: October 11, 2016
Kuzmenko E., Mustakimova E., , in : Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015). : М. : Изд-во РГГУ, 2015. P. 388-398.
The problem of morphological ambiguity is widely addressed in the modern NLP. Mostly ambiguity is resolved with the use of large manually-annotated corpora and machine learning. However, such methods are not always available, as good training data is not accessible for all languages. In this paper we present a method of disambiguation without gold standard ...
Added: July 30, 2015
Лаврентьев А. М., Соловьев Ф. Н., Суворова М. И. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2018 Т. 16 № 3 С. 19-31
ПлатформаTXM предоставляет широкие возможности корпусного анализа, такие как анализ соответствий,
кластеризация, построение лексических таблиц, поиск сложных лексических конструкций, выделение подкорпу-сов по различным параметрам. По умолчанию платформа работает со словоупотреблениями в качестве структур-ных единиц анализа. Она интегрирована с единственным расширениемTreeTagger, позволяющим проводить лишь морфологический анализ и лемматизацию словоупотреблений. Однако пользователь может сопроводить каждое словоупотребление набором дополнительных характеристик, ...
Added: September 8, 2018
Marseille : Association pour le Traitement Automatique des Langues, 2014
Dans la suite du premier atelier TALAf qui s'est tenu le 8 juin 2012 à Grenoble, lors de la conférence JEP-TALN-RECITAL 2012 (voir les actes : http://aclweb.org/anthology//W/W12/#1300), nous proposons une nouvelle édition de cet atelier lors de la conférence TALN 2014 le premier juillet à Marseille.
Cette deuxième édition montre l'intérêt d'un atelier francophone sur le traitement ...
Added: March 26, 2015
Герасимов Д. В., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2016 Т. XII № 1 С. 336-363
The paper presents a corpus-driven study of the Russian PP-based degree modifier do uzhasa (lit. ‘to horror’), suggesting a two-stage grammaticalization path. The first stage (presumably, XVIII–XIX c.) involves subjectification, while during the second stage, subjective readings give rise to intensifier readings through conceptual metonymy. Both stages see a host class expansion. This process is ...
Added: November 27, 2017
Kopotev M., Lyashevskaya O., Mustajoki A., , in : Quantitative approaches to the Russian language. : Abingdon : Routledge, 2018. P. 3-29.
The Russian language, despite being one of the most studied in the world, until recently has been little explored quantitatively. After a burst of research activity in the years 1960–1980, quantitative studies of Russian vanished. They are now reappearing in an entirely different context. Today, we have large and deeply annotated corpora available for extended ...
Added: October 24, 2017
Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69-89
The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...
Added: June 24, 2021
Lyashevskaya O., Droganova K., Zeman D. et al., / НИУ ВШЭ. Series WP BRP "Linguistics". 2016. No. 44.
This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google. ...
Added: December 14, 2016
Савчук С. О., Архангельский Т. А., Bonch-Osmolovskaya A. A. et al., Вопросы языкознания 2024
The paper provides an overview of the results of the fundamental reconstruction and modernization project of the National Corpus of the Russian Language platform, carried out from 2020 to 2023. The focus of the paper is on the new opportunities that are opening up for linguists and a wider audience. This includes improving the representativeness ...
Added: March 21, 2024
Lyashevskaya O., Ovsjannikova M., Szymor N. et al., , in : Quantitative approaches to the Russian language. : Abingdon : Routledge, 2018. P. 51-78.
The domain of modality is structurally diverse and may be described in multiple ways (for example, see Perkins, 1983; Wierzbicka, 1987; Hengeveld, 1988/2004; Sweetser, 1990; Bondarko, 1990; Bybee et al., 1994; van der Auwera and Plungian, 1998; Palmer, 2001; Hansen, 2004; Nuyts, 2006; Khrakovsky, 2007). The article reports on the Russian part of a larger survey ...
Added: October 24, 2017
M. : Russian State University for the Humanitie, 2015
Added: April 28, 2015