• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Machine Learning and Philology: An Overview of Methods and Applications
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 14, 2026
Resource Race and Green Transition: Three Unexpected Conclusions from Foresight Centres Research on Climate and Poverty
Beneath the surface of green energy—which most people associate with solar panels, electric vehicles, and reduced CO2 emissions—lies a complex web of geopolitical interests, international inequality, and resource constraints. Researchers from the Laboratory for Science and Technology Studies (LST) at the HSE ISSEK Foresight Centre have published a series of articles in leading international journals on hidden and overt conflicts surrounding critically important metals and minerals, as well as related processes in the energy sector.
May 13, 2026
Immersion in Second Language Environment Influences Bilinguals Perception of Emotions
Researchers at the Cognitive Health and Intelligence Centre at the HSE Institute for Cognitive Neuroscience have discovered how bilingual individuals process emotional words in their native (first) and non-native (second) languages. It was found that the link between word meaning and bodily sensations is weaker in a second language than in a first language. However, the more a person is immersed in a language environment, the smaller this difference becomes. The article has been published in Language, Cognition and Neuroscience.
May 12, 2026
‘Any Real-Economy Company Can Use Our Products
The HSE Centre for Financial Research and Data Analytics combines fundamental and applied work, including in areas unique to Russia such as the connection between sentiment in the media and social networks and financial markets. The HSE News Service spoke with the centre’s director, Professor Tamara Teplova, about its work.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Machine Learning and Philology: An Overview of Methods and Applications

Ch. 6. P. 69–84.
Gryaznova E., Kirina M., Mikhailova P., Zarembo V., Moskvina A.

The paper provides an overview of tasks and methods associated with the term artificial intelligence, namely its interrelated field regarding machine learning algorithms as ones of the growing popularity among scholars in digital humanities, that are applicable to the philological studies, as well as the most insightful and successful cases of such work. Although due to the textual nature of the material, the tasks discussed mostly have to do with the area of natural language processing, we focus our attention on the questions that are purely philological and the works that explore the phenomena of literary texts. The reviewed papers show how the techniques such as automatic text classification and clustering, named entity recognition, or sentiment analysis not only help to explore the large collections of texts but also to provide a new way to look at fiction and to redefine some literary concepts, such as genre and style. The review results in the conclusion that applying computation models to fictional texts allows to enrich the understanding of literature and to provide some insights for further qualitative analysis. We are currently testing some of the discussed methods on the Corpus of Russian short stories of the first third of the 20th century.

Language: English
Full text
DOI
Keywords: computational linguisticsmachine learning and data analysis text mining
Publication based on the results of:
Методы искусственного интеллекта для филологических исследований (2021)

In book

Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2022)
Springer, 2024.
Similar publications
Stereotypes on Trial: Exploring the Role of Victim Alcohol Abuse in Femicide Sentencing in Russia
Zhuchkova S., Smirnov N., Социология власти 2025 Vol. 37 No. 4 P. 19–50
This study examines how victims’ alcohol abuse affects sentencing in cases where a woman is killed by her intimate partner in Russia, focusing on gender differences among judges. The research uses a dataset of 1,478 court verdicts (2013–2019), obtained via web scraping from official sources and processed through text mining techniques. Using regression analysis, the ...
Added: December 21, 2025
Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus
Maslenikova A., Tatiana I. Popova, , in: 27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187Vol. 16187: Lecture Notes in Artificial Intelligence.: Springer, 2025. P. 278–292.
This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the ...
Added: October 19, 2025
27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part II. Speech and Computer. Lecture Notes in Artificial Intelligence 16188
Springer, 2025.
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or ...
Added: October 19, 2025
Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”
Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223
Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...
Added: October 19, 2025
Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)
[б.и.], 2025.
This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...
Added: October 19, 2025
27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187
Springer, 2025.
Added: October 13, 2025
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics
Wien: Association for Computational Linguistics, 2025.
Added: August 26, 2025
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Association for Computational Linguistics, 2025.
Originally named the Association for Machine Translation and Computational Linguistics (AMTCL), the Association for Computational Linguistics was founded in 1962 and renamed the ACL in 1968. The ACL is run by some 20 volunteers overseeing the administration of the Association (organising elections, deciding on new actions, adapting to the fast changing trends of our fields), ...
Added: July 17, 2025
Тексты судебных приговоров как источник данных для эмпирических исследований права в России
Zhuchkova S., Девятников В. Ю., Kazun A. et al., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 2 С. 170–192
The development of empirical legal studies in Russia is restricted by the lack of sources of disaggregated data on law enforcement available to social researchers. One of the potential sources of such data, which is still insufficiently used in Russian research, is the publicly available texts of court verdicts, in particular court sentences. This article ...
Added: May 8, 2025
Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2025)
., 2025.
The volume includes 9 papers from the international conference on computational linguistics and intelligent technologies “Dialogue 2025,” representing a wide range of theoretical and applied research in the fields of natural language description, modeling of linguistic processes, and the development of practically applicable computational linguistic technologies. Intended for specialists in theoretical and applied linguistics and intelligent ...
Added: April 28, 2025
Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020"
., 2020.
Added: April 10, 2025
Findings of the Association for Computational Linguistics: EACL 2024
Association for Computational Linguistics, 2024.
The 18th Conference of the European Chapter of the Association for Computational Linguistics. EACL is the flagship European conference dedicated to European and international researchers, covering a wide spectrum of research in Computational Linguistics and Natural Language Processing. ...
Added: February 17, 2025
Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов
Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164
The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...
Added: January 18, 2025
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Association for Computational Linguistics, 2024.
Added: January 2, 2025
Findings of the Association for Computational Linguistics: ACL 2024
Association for Computational Linguistics, 2024.
ACL 2024 invites the submission of long and short papers featuring substantial, original, and unpublished research in all aspects of Computational Linguistics and Natural Language Processing. As in recent years, some of the presentations at the conference will be of papers accepted by the Transactions of the ACL (TACL) and by the Computational Linguistics (CL) ...
Added: December 24, 2024
27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534
Springer, 2025.
International conference “Internet and Modern Society” (IMS-2024) is mainly organized by ITMO University, held in St. Petersburg, during the Information Society Week. Important tasks of the IMS-2024 are contribution to the formation of specialists’ international community and promotion of research and development in the field of information society technologies. ...
Added: November 29, 2024
Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)
Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54
The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...
Added: November 8, 2024
Anti-vaccination Movement on VK: Information Exchange and Public Concern
Petrov I., , in: Digital Transformation and Global Society. 6th International Conference, DTGS 2021, St. Petersburg, Russia, June 23–25, 2021, Revised Selected Papers.: Springer, 2022. P. 108–121.
Vaccination is a simple yet effective method for controlling the spread of communicable diseases. However, an increasing number of individuals are expressing distrust in the vaccination process and are choosing not to vaccinate themselves or their children. One explanation suggests that such doubtfulness is maintained through widespread misinformation available on social media. This research takes ...
Added: May 16, 2024
Понятность языка правосудия: опыт эмпирического исследования содержания и синтаксиса судебных решений
Chaplinskiy A., Knutov A., Alimpeev D., Закон 2024 № 2 С. 159–177
For many years, the primary challenges in the legal sphere regarding language in legal proceedings have been the utilization of national languages of the republics and foreign access to justice. However, the authors of this paper hypothesize that citizens and organizations of Russian origin often require assistance in translating “legal Russian” to “everyday Russian”. This ...
Added: February 21, 2024
Linguistic mechanisms of colour term evolution: A diachronic investigation of “Russian browns” buryj and koričnevyj
Bochkarev V. V., Shevlyakova A., Solovyev V. et al., Diachronica 2023 Vol. 40 No. 4 P. 492–531
We investigated diachrony of distributional semantics of two competing Russian colour terms (CTs) for ‘brown’, buryj (11th century) and koričnevyj (17th century), using the Russian subcorpus of Google Books Ngram (2020). Time-series analysis (1800–2019) of bigrams gauged each term’s frequencies of occurrence and changes in combinability with nouns for natural objects, artefacts, abstract concepts and figurative expressions. In frequency, koričnevyj overtook buryj in the ...
Added: February 19, 2024
Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе
Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138
In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...
Added: February 11, 2024
Perception of AI-generated art: text analysis of online discussions
Bosonogov S., Suvorova A., Journal of Mathematical Sciences 2024 Vol. 285 P. 1–13
In this work we analyze comments on three subreddits related to AI-generated art to understand how people perceive the ability of AI to create art and the topics and moods of discussions in the context of widespread usage of pre-trained models. We used computational text analysis techniques such as LDA topic modeling and sentiment analysis ...
Added: February 4, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit