?
Новый комплекс инструментов автоматической обработки текста для платформыTXM и его апробация на корпусе для анализа экстремистских текстов
Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация. 2018. Т. 16. № 3. С. 19-31.
Language:
Russian
Keywords: корпусная лингвистикаcorpus linguisticscorrespondence analysisанализ соответствийautomated morphological analysisspecificityавтоматический морфологический анализautomated syntactic parsingTXM platformdetecting extremist textsавтоматический синтак-сический анализплатформаTXMспецифичностьвыявление экстремистских текстов
Лаврентьев А. М., Смирнов И. В., Соловьев Ф. Н. et al., Системы высокой доступности 2018 Т. 14 № 3 С. 76-81
The extension of the TXM platform for case analysis is considered. It is proposed to use the allocation of pseudo-words in words of text on the basis of the method of structural schemes and the identification of nominal groups in the structure of the text forselecting subcorps in terms of parameters. The results of the ...
Added: September 20, 2018
СПб. : Издательство Санкт-Петербургского университета, 2019
Сборние содержит материалы докладов, представленных на Международной научной конференции "Корпусная лингвистика-2019" 24-28 июня 2019 г. в Санкт-Петербурге. ...
Added: July 8, 2019
M. : Russian State University for the Humanitie, 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69-89
The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...
Added: June 24, 2021
Arkhangelskiy T., Гильмуллин Р. А., Невзорова О. А. et al., Научно-техническая информация. Серия 2: Информационные процессы и системы 2013
В статье описывается электронный корпус татарского языка, созданный в рамках программы фундаментальных исследований Президиума РАН "Корпусная лингвистика", и методы, использованные авторами для создания этого корпуса. В частности, описываются текстовый состав и жанровая структура корпуса, принятые авторами решения о выделении морфологических характеристик, автоматическая морфологическая разметка текстов с помощью двухуровневой модели морфологии и анализатора PC-KIMMO и размещение ...
Added: October 25, 2013
Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80-95
The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...
Added: April 15, 2015
Издательство Санкт-Петербургского государственного университета, 2019
Сборник содержит материалы докладов, представленных на Международной научной конференции «Корпусная лингвистика-2019» 24–28 июня 2019 г. в Санкт-Петербурге. Создание корпусов текстов является одним из приоритетных направлений в современной лингвистике. Проведение конференции по данной тематике знакомит ученых с современными разработками и новыми технологическими решениями в этой области, а также способствует обобщению опыта научных исследований по корпусной лингвистике. ...
Added: November 1, 2020
M. : Russian State University for the Humanitie, 2015
Added: April 28, 2015
Arkhangelskiy T., Научно-техническая информация. Серия 2: Информационные процессы и системы 2012 № 4 С. 24-29
Four electronic corpora created in 2011 within the framework of the “Corpus Linguistics: the Albanian, Kalmyk, Lezgian, and Ossetic Languages” Program of Fundamental Research of the RAS are presented. The interface and functionalities of these corpora are described, engineering problems to be solved in their creation are elucidated, and the promises of their development are ...
Added: October 31, 2012
Marseille : Association pour le Traitement Automatique des Langues, 2014
Dans la suite du premier atelier TALAf qui s'est tenu le 8 juin 2012 à Grenoble, lors de la conférence JEP-TALN-RECITAL 2012 (voir les actes : http://aclweb.org/anthology//W/W12/#1300), nous proposons une nouvelle édition de cet atelier lors de la conférence TALN 2014 le premier juillet à Marseille.
Cette deuxième édition montre l'intérêt d'un atelier francophone sur le traitement ...
Added: March 26, 2015
СПб. : Издательство СПбГУ, 2017
Труды международной конференции. ...
Added: December 31, 2017
Sibirtseva V., Khomenko A., Baranova J., Образовательные технологии и общество 2013 Т. 16 № 3 С. 508-521
The article reports about the students and teachers research group of National Research University Higher School of Economics entitled "Corplingui (Nizhny Novgorod-Moscow)"development. This work is about the research in the field of computer and corpus linguistics. Development primarily focuses on the creation of interactive resources based on the materials of The Russian National Corpus. The ...
Added: October 4, 2013
М. : Изд-во РГГУ, 2017
The 16th issue of the annual report “Computational Linguistics and Intellectual Technologies” contains the selected materials of the 23rd international conference “Dialogue”. The presented works reflect the areas of research in computational modelling and analysis of natural language that are traditionally represented at the conference. ...
Added: March 15, 2017
Association for Computational Linguistics, 2017
The volume includes papers presented at the 16th International Workshop on Treebanks and Linguistic Theories (TLT), which brings together developers and users of linguistically annotated natural language corpora. As ‘treebanks’ we consider any pairing of natural language data (spoken or written) with annotations of linguistic structure at various levels of analysis, ranging from e.g. morpho-phonology ...
Added: December 11, 2018
Лаврентьев А. М., Смирнов И. В., Соловьев Ф. Н. et al., Вопросы кибербезопасности 2019 № 4(32) С. 54-60
Цель исследования: разработка методики создания и автоматического анализа специальных корпусов текстов для последующего применения их в качестве обучающих выборок и определения дифференцирующих признаков в задачах классификации текстов.
Метод: применялись инструменты анализа корпусной платформы TXM, расширенной разработанными процедурами вычисления дополнительных характеристик текстов, таких как буквосочетания, псевдоосновы, именные группы, глагольные группы.
Полученные результаты: показано, что разработанные средства расширения ...
Added: August 10, 2019
CEUR Workshop Proceedings, 2020
The International Conference “Internet and Modern Society” (IMS-2020) was initially planned to take place in St. Petersburg, Russia. Due to the spread of COVID-19 and the ban on public events, the conference was held during 17-20 June 2020 in the format of online sessions with a discussion of papers and presentations uploaded in advance. The ...
Added: November 1, 2020
Пономарева М. А., Дроганова К. А., Smurov I. et al., Florence : Association for Computational Linguistics, 2019
This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a ...
Added: September 5, 2019
Alexeeva S. V., Protopopova E. V., Bodrova A. A. et al., Компьютерная лингвистика и интеллектуальные технологии 2014 P. 562-571
The paper describes the noun phase and anaphora annotation in OpenCorpora and compares it to that in other corpora. We discuss the choice of representative texts for anaphoric annotation and the basic principles of syntactic annotation. In case of noun phrase annotation we followed the scheme introduced earlier for morphological annotation: it was carried out ...
Added: October 8, 2014
Matkin N. A., Культура и технологии 2021 Т. 6 № 1 С. 26-32
There were a lot of changes during 2019 and 2020 in Perm such as transport reform, zoo construction, change of governor and mayor. All changes reflect on the image of the city, which is constructing in the residents’ mind. From one hand the image of the city is formed by media, on the other hand ...
Added: October 23, 2021
Popkova E., Социосфера 2010 № 4 С. 74-81
The article discusses the most recent trends in the development of the English progressive. A corpus-based approach to linguistic research is seen as an effective means of determining reliability of the data retrieved and helps track the major diachronic dynamic in the increasing frequency of the progressive aspect that has taken place since the beginning ...
Added: November 6, 2012
Kibrik A. A., Khudyakova M., Dobrov G. B. et al., Frontiers in Psychology 2016 Vol. 7 No. 1429 P. 1-21
We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, ...
Added: September 28, 2016
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Arkhangelskiy T., Panova T., International Journal of the Sociology of Language 2014
The purpose of our study is to investigate the lexicalization of so-called adverbial phrases, such as fun a mol, in modern Hasidic Yiddish in comparison with written literary Yiddish of the 20th century. The phenomenon in question is a historical process in which several lexemes forming a frequent collocation (including nouns, adjectives, adverbs, prepositions and ...
Added: December 11, 2014
Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151-159
*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности.
The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...
Added: February 18, 2019