• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Baselines and Symbol N-Grams: Simple Part-Of-Speech Tagging of Russian?
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Baselines and Symbol N-Grams: Simple Part-Of-Speech Tagging of Russian?

P. 9–19.
Arefyev, N.V., Ermolaev P.

We propose using NB-SVM over bag of character n-grams input representation for determining part-of-speech tags and grammatical categories like gender, number, etc. for words in Russian texts. Several methods are compared including CRF (Conditional Random Fields), SVM (Support Vector Machines) and NB-SVM (Naive Bayes SVM) and superiority of NB-SVM over other classifiers is shown. The proposed model is the 5th best among 12 other models in the first shared task of the MorphoRuEval-17 challenge. We also experimented with category grouping when a single classifier is used to determine several grammatical categories and showed that it improves the model per- formance even further.

Language: English
Text on another site
Keywords: POS-taggingmultilabel classificationSupport Vector Machines (SVM)

In book

Supplementary Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2017), Moscow, Russia, July 27-29, 2017
Vol. 1975. , Aachen: CEUR-WS.org, 2017.
Similar publications
Hybrid Fault Detection in Three-Phase Induction Motors
Ali S., Khizhik A., Ryzhikov A. et al., , in: 2025 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 12-13 May 2025.: IEEE, 2025. P. 357–360.
Three-phase induction motors play a crucial role in industrial applications due to their efficiency, durability, and reliability. However, effective fault detection remains challenging, primarily due to the scarcity of labeled failure data, which limits the performance of traditional machine learning (ML)-based diagnostic models and increases the risk of overfitting and poor generalization. Conventional methods, such ...
Added: July 3, 2025
Сборка, хранение и предобработка коллекции документов для обучения multi-label классификатора текстов на естественном русском языке
Krayushkin O., Смирнов М., Чернобай Ю., В кн.: 1st conference on Software Engineering and Information Management (SEIM-2016).: СПб.: [б.и.], 2016.
в работе были выявлены основные особенности организации сборки, хранения и предобработки датасета для формирования обучающей выборки multi-label классификатора текстов на естественном русском языке ...
Added: November 4, 2021
Native Language Identification for Russian
Remnev N., , in: 2019 International Conference on Data Mining Workshops (ICDMW).: IEEE, 2019. P. 1–7.
The task of recognizing the author’s native language based on a text (Native Language Identification - NLI) is the task of automatically recognizing native language (L1) based on texts written in a language that is not native to the author. The NLI task was studied in detail for the English language, and two shared tasks ...
Added: October 18, 2021
Native Language Identification For Russian Using Errors Types
Remnev N., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 1123–1133.
The task of recognizing the author’s native (Native Language Identification—NLI) language based on a texts, written in a language that is non-native to the author—is the task of automatically recognizing native language (L1). The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 and 2017, where ...
Added: October 18, 2021
Intelligent Data Processing 11th International Conference, IDP 2016, Barcelona, Spain, October 10–14, 2016, Revised Selected Papers
Switzerland: Springer, 2019.
This book constitutes the refereed proceedings of the 11th International Conference on Intelligent Data Processing, IDP 2016, held in Barcelona, Spain, in October 2016.     The 11 revised full papers were carefully reviewed and selected from 52 submissions. The papers of this volume are organized in topical sections on machine learning theory with applications; intelligent data processing in life ...
Added: February 8, 2020
Multilabel Classification for Inflow Profile Monitoring
Ignatov D. I., Spesivtsev P., Kurgansky D. et al., , in: Proceedings of the MACSPro Workshop 2019Vol. 2478: CEUR Workshop Proceedings.: CEUR-WS.org, 2019. P. 177–184.
The purpose of this study is to identify the position of non- performing inflow zones (sources) in a wellbore by means of machine learning techniques. The training data are obtained using the transient multiphase simulators and represented as the following time-series: bottom- hole pressure, well-head pressure, flowrates of gas, oil, and water along with a ...
Added: November 1, 2019
A cross-genre morphological tagging and lemmatization of the Russian poetry: distinctive test sets and evaluation
Starchenko A., Lyashevskaya O., , in: Digital Transformation and Global Society. Fourth International Conference, DTGS 2019, St. Petersburg, Russia, June 19–21, 2019, Revised Selected Papers.: Springer, 2019. P. 732–743.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic ...
Added: June 12, 2019
Morphological analysis for Russian: Integration and comparison of taggers
Kuzmenko E., , in: Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information ScienceVol. 661.: Switzerland: Springer, 2017. P. 162–161.
In this paper we present a comparison of three morphological taggers for Russian with regard to the quality of morphological disambiguation performed by these taggers. We test the quality of the analysis in three different ways: lemmatization, POS-tagging and assigning full morphological tags. We analyze the mistakes made by the taggers, outline their strengths and ...
Added: April 22, 2017
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit