• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Genre Classification Problem: in Pursuit of Systematics on a Big Webcorpus
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Genre Classification Problem: in Pursuit of Systematics on a Big Webcorpus

P. 70–83.
Shavrina T.

This article is devoted to the problem of defining a genre in computer linguistics and searching for parameters that could formalize the concept of a genre. All kinds of existing typologies of genres rely on different types of features, whereas in the practice of NLP, any modern applications are adapted to learning on big data, and therefore - on text features that do not require additional non-automatic markup. Based on such text-internal features, in this article, we focus on the differentiation of various genres and their grouping on the basis of a similar distribution of features. The description of the contribution of various types of features to the final result and their interpretation are given, and also an analysis of how such features can be used to further adaptation of NLP models is provided. The materials of the "Taiga" corpus with genre annotation are used as experimental data.

Language: English
Full text
DOI
Text on another site
Keywords: corpus linguisticscorpus analysistext classification

In book

Proceedings of Third Workshop "Computational linguistics and language science"
Wohlgenannt G., von Waldenfels R., Toldova S., Rakhilina E. V., Lyashevskaya O., Loukachevitch N. V., Artemova E. Issue 4. , Manchester: EasyChair, 2019.
Similar publications
Российская социология в условиях цифровизации общества: результаты анализа корпуса научных текстов
Smirnov A., Социологические исследования 2023 № 4 С. 39–50
Using the analysis of a corpus of texts from eight leading Russian sociological journals, the article examines the impact of the digitalization of society on sociology in 2000–2021. Frequency analysis of 13.8 thousand scientific texts tracked the introduction of concepts related to digitalization into academic circulation. The article reveals the differences between the journals, due ...
Added: March 18, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Promotional adjectives in grant proposal abstracts: a corpus study
Dmitriy S. Tulyakov, Tatiana M. Permyakova, Ekaterina A. Balezina, Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Vol. 24 No. 6 P. 58–67
By effectively integrating promotional discourse into grant proposal abstracts, researchers can more compellingly present their ideas and increase their chances of securing funding. Implications of promotional adjectives in grant writing might differ across various research fields. This study aims to explore the use of promotional adjectives in abstracts of research grant proposals in six research ...
Added: March 2, 2026
A genre-based model of rhetorical structure in scoping review introductions
Elena V. Tikhonova, Kosycheva M. A., Training, Language and Culture 2025 Vol. 9 No. 4 P. 35–55
As genre modelling advances, describing research articles rhetorical structures becomes crucial. Though secondary to empirical studies, scoping reviews shape scholarly communication by framing analysis and setting epistemological benchmarks. Their introductions act as conceptual lenses, defining interpretive frameworks. However, most rhetorical models, designed for empirical articles, appear to be inadequate for scoping reviews. We propose a ...
Added: February 9, 2026
Динамика восприятия площадей в пространстве города носителями русского языка (сравнительный анализ по данным НКРЯ)
Belova P., В кн.: Актуальные вопросы лингвистики и литературоведения: сборник научных статей по материалам международной научной конференции памяти доктора филологических наук, профессора Л.А. Араевой (6–8 февраля 2025).: Кемеровский государственный университет, 2025. С. 155–160.
This article contains research results on the dynamics of squares’ perception in the city space in the Russian language picture of the world over time, starting from the second half of the XXth century to the present. Turning to the subcorpus of literary texts of the second half of the XXth century and the XXIst ...
Added: February 4, 2026
Preposition drop in Russian spoken by Mari and Beserman bilinguals
Yakovleva A., Kosheliuk N., Moroz G., International Journal of Bilingualism 2025 P. 1–19
Aims and Research Questions: In this paper, we present a corpus-based study of preposition drop (p-drop) in the speech of Mari-Russian and Beserman-Russian bilinguals compared to the speech of Russian monolinguals. Based on data from spoken corpora, we demonstrate that the prepositions v ‘in’, k ‘to’, s ‘with’ are omitted in the speech of bilinguals ...
Added: November 26, 2025
Вариативность годов vs. лет в русских говорах: корпусное исследование
Zemicheva S., Moroz G., Naccarato C., Вопросы языкознания 2025 № 6 С. 7–34
Наличие супплетивной формы лет в парадигме существительного год отличает русский язык от других восточнославянских. При этом в русских говорах вместо лет может использоваться вариант годов. Данные панхронического подкорпуса НКРЯ показывают, что форма годов, зафиксированная впервые в XV в., на всем протяжении истории русского языка была периферийной, в XVII–XVIII вв. использовалась преимущественно в нехудожественных текстах, а в ...
Added: November 12, 2025
Репрезентация фрейма ГОРОД в текстах почтовой переписки: корпусное исследование
Куликова В. А., Человек: образ и сущность. Гуманитарные аспекты 2026 № 1 С. 64–81
The verbal representation of a city is studied using the material of 993 contexts containing a description of a city in postcards of the pre-revolutionary, Soviet and post-Soviet periods. The object of the analysis is the frame-structure CITY, whilst the subject of the analysis is the features of its verbalization in the corpus of postal ...
Added: November 2, 2025
Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus
Maslenikova A., Tatiana I. Popova, , in: 27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187Vol. 16187: Lecture Notes in Artificial Intelligence.: Springer, 2025. P. 278–292.
This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the ...
Added: October 19, 2025
27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part II. Speech and Computer. Lecture Notes in Artificial Intelligence 16188
Springer, 2025.
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or ...
Added: October 19, 2025
Variation in a Narrative Corpus of Mano and Kpelle: Contact-Induced or Not?.
Khachaturyan M., Konoshenko M., Moroz G. et al., , in: N’yng-dyuumgu, n’yng-ngafq: Festschrift for Ekaterina GruzdevaVol. 126.: Helsinki: Studia Orientalia, 2025. P. 35–59.
This paper explores a corpus of spontaneous narratives and narrative retellings told by children and adults in Mano and Kpelle, two contacting Mande languages. It focuses on quotative constructions as a key point of grammatical dissimilarity between Mano and Kpelle. In the Mano speech of some bilingual children, however, these constructions are found to manifest ...
Added: September 5, 2025
Переписка Н. С. Хрущева и Ф. Кастро периода Карибского кризиса: опыт компьютеризованного анализа
Герцен А. С., В кн.: Четвёртая зимняя школа по гуманитарной информатике.: Балтийский федеральный университет им. Иммануила Канта, 2020. С. 92–97.
The article analyzes the 1st Secretary of the Central Committee of the CPSU and Chairman of the Council of Ministers of the USSR N. S. Khrushchev and the leader of the Cuban revolution F. Castro Ruz’s letters written in the period from October 26 to 31, 1962 on the topic of the Caribbean crisis and ...
Added: July 15, 2025
An overview of morphosyntactic variation in the speech of Russian-Chuvash bilinguals: number, gender, case assignment and preposition drop
Grishanova A., Russian linguistics 2025 Vol. 49 Article 10
The purpose of this study is to present a summary of morphosyntactic variation and a detailed analysis of the phenomenon of preposition drop in the Russian speech of Chuvash bilinguals. Specifically, I investigate what underlying factors might condition the variation. I conduct a qualitative analysis of the data extracted from the corpus of Russian spoken ...
Added: July 10, 2025
Do Formal Stance Strategies Reveal Disciplinary Variation in Professional Scientific Writing?
Smirnova E. A., Pérez-Guerra J., International Journal of Applied Linguistics 2025 Vol. 35 No. 3 P. 1242–1261
Stance in academic discourse has been extensively studied, with numerous investigations indicating that its expression varies across disciplines, depending on the authors’ intention to either enhance or diminish their voice or presence (e.g. It seems fairly certain versus This is based on the belief that...). This paper hypothesises that stance can be viewed as a ...
Added: April 10, 2025
Русский язык в условиях контактирования: тюркско-русское языковое взаимодействие. Часть 1. Социолингвистическое и корпусное исследование
Резанова З. И., Artemenko E., Диброва В. С. et al., Томск: Издательство Томского государственного университета, 2024.
В монографии представлены собственно лингвистические, социолингвистические и психолингвистические аспекты взаимодействия русского и трех тюркских языков – шорского, хакасского, татарского (сибирского варианта). Охарактеризованы варианты влияния тюркских языков на речевую практику и когнитивные процессы порождения и восприятия речи русскоязычными билингвами. Представлены методики сбора данных, их обработки при формировании социолингвистической базы данных и морфологически размеченного бимодального корпуса русской устной речи билингвов, ...
Added: April 7, 2025
The ‘adverb-ly adjective’ construction in English: meanings, distribution and discourse functions
Taboada M., Goddard C., Trnavac R., English Language and Linguistics 2025 Vol. 29 No. 1 P. 102–131
We investigate a class of adjective phrases composed of a deadjectival adverb ending in -ly and an adjective head (e.g. staggeringly incompetent, absolutely terrific, fiscally responsible), a compact construction whereby two adjectives may jointly contribute to evaluative meaning. Using corpus methodologies on more than 1 million examples and relying on semantic analyses of about 1,000 instances, we propose that the ...
Added: April 4, 2025
Creation and Analysis of the Multimedia Russian Corpus for Gesture Research
Rakhilina E. V., Cienki A., , in: The Cambridge Handbook of Gesture Studies.: Cambridge University Press, 2024. P. 249–272.
The chapter considers gesture studies in relation to corpus linguistic work. The focus is on the Multimedia Russian Corpus (MURCO), part of the Russian National Corpus. The chapter includes a brief biography of the creator of this corpus, Elena Grishina. The compilation of the corpus out of a set of Russian classic feature films and ...
Added: February 13, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit