Machine Learning and Philology: An Overview of Methods and Applications

Gryaznova E.; M. Kirina; Mikhailova P.; Zarembo V.; A. Moskvina

doi:10.1007/978-3-031-50609-3_6

Publications

?

Machine Learning and Philology: An Overview of Methods and Applications

Ch. 6. P. 69–84.

Gryaznova E., Kirina M., Mikhailova P., Zarembo V., Moskvina A.

The paper provides an overview of tasks and methods associated with the term artificial intelligence, namely its interrelated field regarding machine learning algorithms as ones of the growing popularity among scholars in digital humanities, that are applicable to the philological studies, as well as the most insightful and successful cases of such work. Although due to the textual nature of the material, the tasks discussed mostly have to do with the area of natural language processing, we focus our attention on the questions that are purely philological and the works that explore the phenomena of literary texts. The reviewed papers show how the techniques such as automatic text classification and clustering, named entity recognition, or sentiment analysis not only help to explore the large collections of texts but also to provide a new way to look at fiction and to redefine some literary concepts, such as genre and style. The review results in the conclusion that applying computation models to fictional texts allows to enrich the understanding of literature and to provide some insights for further qualitative analysis. We are currently testing some of the discussed methods on the Corpus of Russian short stories of the first third of the 20th century.

Language: English

Full text

DOI

Keywords: computational linguistics machine learning and data analysis text mining

Publication based on the results of:

Методы искусственного интеллекта для филологических исследований (2021)

In book

Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2022)

Springer, 2024.

Stereotypes on Trial: Exploring the Role of Victim Alcohol Abuse in Femicide Sentencing in Russia

Zhuchkova S., Smirnov N., Социология власти 2025 Vol. 37 No. 4 P. 19–50

This study examines how victims’ alcohol abuse affects sentencing in cases where a woman is killed by her intimate partner in Russia, focusing on gender differences among judges. The research uses a dataset of 1,478 court verdicts (2013–2019), obtained via web scraping from official sources and processed through text mining techniques. Using regression analysis, the ...

Added: December 21, 2025

Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus

Maslenikova A., Tatiana I. Popova, , in: 27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187Vol. 16187: Lecture Notes in Artificial Intelligence.: Springer, 2025. P. 278–292.

This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the ...

Added: October 19, 2025

27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part II. Speech and Computer. Lecture Notes in Artificial Intelligence 16188

Springer, 2025.

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or ...

Added: October 19, 2025

Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”

Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223

Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...

Added: October 19, 2025

Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)

[б.и.], 2025.

This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...

Added: October 19, 2025

27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187

Springer, 2025.

Added: October 13, 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Wien: Association for Computational Linguistics, 2025.

Added: August 26, 2025

Тексты судебных приговоров как источник данных для эмпирических исследований права в России

Zhuchkova S., Девятников В. Ю., Kazun A. et al., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 2 С. 170–192

The development of empirical legal studies in Russia is restricted by the lack of sources of disaggregated data on law enforcement available to social researchers. One of the potential sources of such data, which is still insufficiently used in Russian research, is the publicly available texts of court verdicts, in particular court sentences. This article ...

Added: May 8, 2025

Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2020"

., 2020.

Added: April 10, 2025

Findings of the Association for Computational Linguistics: EACL 2024

Association for Computational Linguistics, 2024.

The 18th Conference of the European Chapter of the Association for Computational Linguistics. EACL is the flagship European conference dedicated to European and international researchers, covering a wide spectrum of research in Computational Linguistics and Natural Language Processing. ...

Added: February 17, 2025

Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов

Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164

The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...

Added: January 18, 2025

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Association for Computational Linguistics, 2024.

Added: January 2, 2025

Findings of the Association for Computational Linguistics: ACL 2024

Association for Computational Linguistics, 2024.

ACL 2024 invites the submission of long and short papers featuring substantial, original, and unpublished research in all aspects of Computational Linguistics and Natural Language Processing. As in recent years, some of the presentations at the conference will be of papers accepted by the Transactions of the ACL (TACL) and by the Computational Linguistics (CL) ...

Added: December 24, 2024

27th International Conference, IMS 2024, St. Petersburg, Russia, June 24–26, 2024, Selected Papers. Internet and Modern Society. Human-Computer Communication. CCIS, volume 2534

Springer, 2025.

International conference “Internet and Modern Society” (IMS-2024) is mainly organized by ITMO University, held in St. Petersburg, during the Information Society Week. Important tasks of the IMS-2024 are contribution to the formation of specialists’ international community and promotion of research and development in the field of information society technologies. ...

Added: November 29, 2024

Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)

Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54

The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...

Added: November 8, 2024

Anti-vaccination Movement on VK: Information Exchange and Public Concern

Petrov I., , in: Digital Transformation and Global Society. 6th International Conference, DTGS 2021, St. Petersburg, Russia, June 23–25, 2021, Revised Selected Papers.: Springer, 2022. P. 108–121.

Vaccination is a simple yet effective method for controlling the spread of communicable diseases. However, an increasing number of individuals are expressing distrust in the vaccination process and are choosing not to vaccinate themselves or their children. One explanation suggests that such doubtfulness is maintained through widespread misinformation available on social media. This research takes ...

Added: May 16, 2024

Понятность языка правосудия: опыт эмпирического исследования содержания и синтаксиса судебных решений

Chaplinskiy A., Knutov A., Alimpeev D., Закон 2024 № 2 С. 159–177

For many years, the primary challenges in the legal sphere regarding language in legal proceedings have been the utilization of national languages of the republics and foreign access to justice. However, the authors of this paper hypothesize that citizens and organizations of Russian origin often require assistance in translating “legal Russian” to “everyday Russian”. This ...

Added: February 21, 2024

Linguistic mechanisms of colour term evolution: A diachronic investigation of “Russian browns” buryj and koričnevyj

Bochkarev V. V., Shevlyakova A., Solovyev V. et al., Diachronica 2023 Vol. 40 No. 4 P. 492–531

We investigated diachrony of distributional semantics of two competing Russian colour terms (CTs) for ‘brown’, buryj (11th century) and koričnevyj (17th century), using the Russian subcorpus of Google Books Ngram (2020). Time-series analysis (1800–2019) of bigrams gauged each term’s frequencies of occurrence and changes in combinability with nouns for natural objects, artefacts, abstract concepts and figurative expressions. In frequency, koričnevyj overtook buryj in the ...

Added: February 19, 2024

Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе

Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138

In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...

Added: February 11, 2024

Perception of AI-generated art: text analysis of online discussions

Bosonogov S., Suvorova A., Journal of Mathematical Sciences 2023 Vol. 529 P. 6–23

In this work we analyze comments on three subreddits related to AI-generated art to understand how people perceive the ability of AI to create art and the topics and moods of discussions in the context of widespread usage of pre-trained models. We used computational text analysis techniques such as LDA topic modeling and sentiment analysis ...

Added: February 4, 2024