• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 30, 2026
HSE Researchers Compile Scientific Database for Studying Childrens Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
April 30, 2026
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features

Ch. 3. P. 9–14.
Litvinova T., Litvinova O., Panicheva P.

Authorship attribution is an important field in online security. Recently there have been numerous successful works in authorship attribution in various European languages. Character n-grams are reported to be the best choice in authorship attribution, as they encode both style and content information. We evaluate different types of character n-gram features in an authorship attribution task in a real-world noisy dataset of Russian forum posts. We also supplement them with a number of new simple n-gram features capturing syntactic and discourse patterns. We perform authorship attribution in a single-topic and a cross-topic setting, as the research question is whether character n-grams capture both style and content information. Our results show that character n-grams are indeed very successful in Russian forum post authorship attribution. However, there is no clear distinction of style and content n-grams, as the same types of n-grams work well for both single-topic and cross-topic settings. In our experiments the generalized simple n-gram features which reveals syntactic and discourse patterns were proved to be also very important in authorship attribution of short informal Russian texts. They represent a different kind of authorship information and are a successful addition to the character n-grams in authorship attribution of forum texts in the Russian language.

Language: English
Full text
DOI
Keywords: Russian languageauthorship attributionn-gramExtremist forum

In book

NLPIR 2019: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval
ACM, 2019.
Similar publications
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Rubic2: Ensemble Model for Russian Lemmatization
Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.
Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки
Затулин К. Ф., Егоров В. Г., Докучаева А. В. et al., М.: Институт диаспоры и интеграции (Институт стран СНГ), 2025.
Книга «Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки» содержит результаты исследования, проведенного в Абхазии, Азербайджане, Армении, Беларуси, Грузии, Казахстане, Киргизии, Латвии, Литве, Молдове, Приднестровской Молдавской Республике, Таджикистане, Узбекистане, Эстонии и Южной Осетии. Исследование выполнено Институтом диаспоры и интеграции (Институтом стран СНГ) в 2024 году. Оно включило в себя анализ нормативно-правовых ...
Added: February 3, 2026
Методика обучения младших школьников чтению на русском и английском языках: сходство и различие
[б.и.], 2022.
The article highlights the importance of the role of teaching reading to children, its specific features and components; the main methods used in teaching reading to children both in Russian and in English are considered; a comparative characteristic of the two languages is made. In addition, the article also compares the methods of teaching reading ...
Added: January 31, 2026
Semi-fake indexicals in Russian
Тискин Д. Б., Типология морфосинтаксических параметров 2025 Vol. 8 No. 1 P. 112–129
There are several rival theories of fake indexicals, i.e. bound indexicals (prominently pronouns) whose φ-features do not semantically contribute to focus alternatives (e.g. Only Mary did her homework, John didn’t do his). According to Minimal Pronoun theories (such as Kratzer’s or Wurmbrand’s), bound pronouns are Merged without φ-features and acquire them under binding via agreement-like ...
Added: January 26, 2026
Некоторые модификации к теории связанных употреблений индексальных выражений И. Басси
Тискин Д. Б., Типология морфосинтаксических параметров 2024 Т. 7 № 1 С. 107–123
Fake indexicals (FIs), or bound-variable uses of e.g. 1st - and 2 nd -person pronouns, have been analysed by Bassi (2021) as arising from a post-syntactic process of inspecting the features of the referent. This leads to a peculiar analysis of the syntax and semantics of relative clauses containing FIs. I argue for a more ...
Added: January 26, 2026
Проблема формирования национального самосознания у детей в процессе изучения родного языка в трудах К. Д. Ушинского
Бизяева Н. Д., Проблемы современного образования 2025 № 4 С. 134–141
This study is the result of understanding the views of K. D. Ushinsky on the problem of forming national self-awareness in children in the process of studying their native language. It was determined that the idea of nationality, expressed in the theoretical and axiological principles of K. D. Ushinsky, was quite clearly expressed in “The ...
Added: December 16, 2025
Detecting Ethnic Conflict in Social Media with Transformers and Augmented Data
Koltsova O., Surkov A., Procedia Computer Science 2025 Vol. 258 P. 2382–2390
Chest X-ray pathology prediction play a very important role in early disease detection, enabling timely intervention and improving patient outcomes. Detection of ethnic conflict mentioning, discussion, or verbal participation therein in user-generated content is a socially important task, as such content has been proven related to ethnic clashes on the ground. Yet this task has not been ...
Added: November 28, 2025
Речевые акты с вежливыми диминутивами: жанровые и дискурсивные особенности
Fufaeva I., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Т. 24 № 4 С. 78–90
This study delves into speech acts utilizing diminutives for politeness, focusing on their discursive and genre-related aspects. It draws on authorial recordings of spoken discourse, data from the National Corpus of the Russian Language, and recordings of urban speech from the 1970s and late twentieth century. The research highlights the potential usage of polite diminutives in ...
Added: November 25, 2025
Интерпретация сложных предложений с разными типами матричных предикатов в контексте отрицания и модальных операторов
Letuchiy A., Russian Linguistics 2025 Т. 49 № 2 Статья 2
The article discusses types of interpretation that Russian complex sentences with factive,implicative and interpretation verbs get under negation and modal operators. By default,the external negative and modal context affects only the main situation. However, one findsexceptions of this rule. We call ‘transparent readings’ those readings in which the exter-nal context affects semantically both the matrix ...
Added: November 5, 2025
Gender stereotypes in agreement processing with role nouns: a study on Russian
Slioussar N., Antropova D., Frontiers in Psychology 2025 Vol. 16 Article 1619505
The majority of Russian nouns denoting professions and social roles are grammatically masculine. Some of them have feminine pairs, the others do not, but in modern Russian, most nouns in this group can be used to refer to women — either with masculine or with feminine agreement. This option has some interesting limitations that have ...
Added: September 22, 2025
Новые номинации мужчин в молодежном сленге
Krongauz M., Труды института русского языка им. В.В. Виноградова 2025 № 3(45) С. 159–167
The article is devoted to modern youth slang, namely to the nominations of men that have appeared most recently: ank, masik, normis, sigma, skuf, tubik, chechik, shtrikh. It is noted that the words masik, tubik, chechik, shtrikh are often discussed together on the Internet and have common semantic and pragmatic characteristics. They denote types of ...
Added: September 17, 2025
Новая количественная модель Платоновского корпуса 2. Филогенетические методы в стилометрии
Alieva O., Вестник Православного Свято-Тихоновского гуманитарного университета. Серия 3: Филология 2025 Т. 84 С. 55–83
Despite the criticism, the standard chronology of Plato’s works continues to hold sway not only over “developmentalists”, but also over various types of “unitarians”. The authority of the standard chronology rests on the confidence that the division of the dialogues into three groups has been “proven” with quantitative methods. In addition to the general theoretical ...
Added: August 28, 2025
Cultural Evaluation of LLMs in Russian: Catchphrases and Cultural Types
Громенко Е. С., Калачева Д. С., Klokova K. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (2025).: [б.и.], 2025.
This study addresses the gap in evaluating large language models' (LLMs) cultural awareness and alignment within the Russian sociocultural context by introducing a structured framework comprising 8 Cultural Types (e.g., Spiritual Practitioner, Soviet Intellectual) and 5 catchphrase groups (e.g., memes, proverbs). A 400-question evalua tion dataset was developed to probe 10 multilingual LLMs, including GPT-4o, ...
Added: May 10, 2025
Контроль в инфинитивной целевой конструкции при глаголах принести и взять в русском языке
Fedorov D., Вопросы языкознания 2025 № 4 С. 77–96
In the article I look at conjunctionless purpose infinitive usages with verbs prinesti ‘bring’ and vz’at’ ‘take’ in the matrix position in Russian. At first, it is unclear whether the expressed object is a dependent of the matrix verb or the embedded verb, and whether the two verbs form a single predicative complex or each ...
Added: April 21, 2025
История идиомы не занимать: реанализ, свернувший с пути
Баркова Л. А., Русский язык в научном освещении 2024 № 2(48) С. 103–128
The article explores the history of an idiom ne zanimat' 'lit. not to borrow' in the context of DCxG. The source of this idiom is the negative matrix clause modal infinitive. This is why the idiom in the earliest contexts was the head of clauses, the syntax of which was identical to the clauses with ...
Added: March 9, 2025
Новый большой сербско-русский словарь (общая концепция и проблемы лексикографического описания)
Драгичевич Р., Королькова М. Д., Ryzhova D. et al., Вопросы лексикографии 2024 № 32 С. 43–60
Added: January 31, 2025
Динамика языковых и культурных процессов в современной России. Выпуск 8. Материалы VIII Конгресса РОПРЯЛ (г. Красноярск, 10–14 сентября 2024 года)
РОПРЯЛ, 2024.
The book includes the texts of reports and scientific presentations of the participants of the VIII Congress of ROPRYAL (Krasnoyarsk, September 10-14, 2024), devoted to topical aspects of the study of Russian language and literature. Special attention is paid to new trends in the description of the Russian language, to the issues of interaction between ...
Added: January 14, 2025
Written vs generated text: “naturalness” as a textual and psycholinguistic category
Kolmogorova A. V, Margolina A. V., Научный результат. Серия: Вопросы теоретической и прикладной лингвистики 2024 Vol. 10 No. 2 P. 71–99
In the context of the development of text generation technologies, the opposition “naturalness − unnaturalness of text” has been transformed into a new dichotomy: “naturalness – artificiality”. The aim of this article is to investigate the phenomenon of naturalness in this context from two perspectives: analyzing the linguistic characteristics of a natural text against a ...
Added: November 29, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit