• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Lemmatisation for under-resourced languages with sequence-to-sequence learning: A case of Early Irish
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Lemmatisation for under-resourced languages with sequence-to-sequence learning: A case of Early Irish

P. 113–124.
Dereza O.

Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item. This task is often considered solved for most modern languages irregardless of their morphological type, but the situation is dramatically different for ancient languages. Rich inflectional system and high level of orthographic variation common to these languages together with lack of resources make lemmatising historical data a challenging task. It becomes more and more important as manuscripts are being extensively digitized now, but still remains poorly covered in literature. In this work, I compare a rule-based and a neural network based approach to lemmatisation in case of Early Irish data.

Language: English
Full text
DOI
Text on another site
Keywords: artificial neural networksNLPautomatic morphological analysisEarly Irishunder-resourced languageslemmatisationsequence-to-sequence models

In book

Proceedings of Third Workshop "Computational linguistics and language science"
Wohlgenannt G., von Waldenfels R., Toldova S., Rakhilina E. V., Lyashevskaya O., Loukachevitch N. V., Artemova E. Issue 4. , Manchester: EasyChair, 2019.
Similar publications
Hebb-Inspired Low Rank Adapters for Large Language Models Fine-Tuning
Alexander Demidovskij, Artyom Tugaryov, Igor Salnikov et al., , in: PRICAI 2025: Trends in Artificial Intelligence: 22nd Pacific Rim International Conference on Artificial Intelligence, PRICAI 2025, Wellington, New Zealand, November 17–21, 2025, Proceedings, Part IIIVol. 16453.: Springer, 2026. P. 603–612.
The backpropagation method is the predominant method for pre-training and fine-tuning of Large Language models. At the same time, it is considerably demanding in terms of memory and hardware. Therefore, it makes fine-tuning and pre-training very expensive, harmful for the environment due to the large carbon footprint, and raises the blocks for the development of ...
Added: April 21, 2026
PRICAI 2025: Trends in Artificial Intelligence: 22nd Pacific Rim International Conference on Artificial Intelligence, PRICAI 2025, Wellington, New Zealand, November 17–21, 2025, Proceedings, Part III
Springer, 2026.
This proceedings contain the papers presented at the 22nd Pacific Rim International Conference on Artificial Intelligence (PRICAI), held on November 17–21, 2025 in Wellington, New Zealand. PRICAI 2025 was co-hosted with the 40th International Conference on Image and Vision Computing New Zealand (IVCNZ 2025) and the annual conference of the New Zealand Artificial Intelligence Researchers ...
Added: April 21, 2026
Granular computing-based deep learning for text classification
Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746
Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...
Added: March 12, 2026
Semi-automatic annotation of brain vessels in magnetic resonance angiography images
Bernadotte A, Elfimov N., Menshikov I., Scientific data 2025 Vol. 13 No. 41
Accurate segmentation of brain vessels in magnetic resonance angiography (MRA) is essential for surgical procedures. Neural networks are powerful tools for medical image segmentation, but their development requires well-annotated datasets. However, publicly available MRA datasets with detailed vessel annotations are scarce. We present a dataset of 100 manually annotated brain MRA images from the IXI ...
Added: February 25, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
INCOMA Ltd, 2021.
Added: January 28, 2026
Тесты как инструменты оценивания в вузах: трудности и решения
Antipkina I., Иванущенко А. В., Калабина И. А. et al., Мир психологии. Научно-методический журнал 2025 № 4(123) С. 295–316
Low-quality test items pose significant risks of biased and inaccurate assessment in higher education. In this study, multi-disciplinary test banks were examined, first, using classical test theory and then using a Large Language Model (Grok). Our findings reveal a number of problems in university test items due to methodological shortcomings rather than content inaccuracies. Based ...
Added: January 22, 2026
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Association for Computational Linguistics, 2025.
Added: November 17, 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...
Added: November 6, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Tartu: University of Tartu Library, 2025.
The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...
Added: July 17, 2025
Формирование требований к технологическим параметрам серийного производства на основе нейросетевого подхода
Yasnitsky L., Голдобин М. А., Прикладная информатика 2025 Т. 20 № 3(117) С. 85–100
Currently, artificial intelligence methods are widely used in the practice of serial production enterprises. They are used to detect defects, classify and eliminate them, identify the causes of defects, predict the quality and properties of the resulting product, select optimal parameters of the production process, and identify and study its patterns. However, outside the field ...
Added: July 10, 2025
Экономические и социальные аспекты атомной энергетики в условиях развития технологий искусственного интеллекта
Podchufarov A., Galkina A. N., Ванина С. С. et al., Экономика и управление: проблемы, решения 2025 Т. 5 № 4 С. 61–74
Under modern conditions, the introduction of artificial intelligence technologies is becoming a significant factor in the development of high-tech industries. The article presents the results of a study of the prospects for the use of intelligent analytical systems in nuclear energy. The experience of foreign countries is analyzed and the features of successful projects using ...
Added: June 5, 2025
Where Do Large Learning Rates Lead Us?
Sadrtdinov I., Kodryan M., Pokonechny E. et al., , in: 38th Conference on Neural Information Processing Systems (NeurIPS 2024).: [б.и.], 2024. P. 58445–58479.
Added: February 19, 2025
Big Data Analytics Approach with Multiple Text Types: The Case of the Computer Gaming
Aleksandr Belov, Zakharov F., Litvinenko E. et al., , in: International IoT, Electronics and Mechatronics Conference, Volume 2. Proceedings of IEMTRONICS 2024. LNEE, volume 1228Vol. 1228.: Springer Publishing Company, 2025. P. 275–287.
Added: January 26, 2025
String Similarity Measures for Evaluating the Lemmatisation in Old Church Slavonic
Afanasev I., Lyashevskaya O., , in: Structuring Lexical Data and Digitising Dictionaries: Grammatical Theory, Language Processing and Databases in Historical Linguistics.: Boston, Leiden: Brill, 2024. P. 13–35.
Added: January 7, 2025
HSE NLP Team at MEDIQA-CORR 2024 Task: In-Prompt Ensemble with Entities and Knowledge Graph for Medical Error Correction
Tutubalina E., Valiev A., Association for Computational Linguistics 2024 P. 470–482
This paper presents our LLM-based system designed for the MEDIQA-CORR @ NAACL-ClinicalNLP 2024 Shared Task 3, focusing on medical error detection and correction in medical records. Our approach consists of three key components: entity extraction, prompt engineering, and ensemble. First, we automatically extract biomedical entities such as therapies, diagnoses, and biological species. Next, we explore ...
Added: December 13, 2024
Data-driven approach to curriculum analysis
Iu. Nasu, M.S. Drobinin, M.S. Efanov et al., Proceedings of the Institute for System Programming of the RAS 2024 Vol. 36 No. 2 P. 83–90
The choice of an educational program is momentous in young people's lives. Given the shortage of time after exams, applicants usually do not have time to analyze possible educational tracks. Furthermore, it requires a thorough study of learning plans. This research addresses the problem proposing the algorithm to data-driven curriculum analysis based on natural language ...
Added: December 11, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit