• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Scalable Modified Kneser-Ney Language Model Estimation
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Scalable Modified Kneser-Ney Language Model Estimation

P. 690–696.
Heafield K., Ivan Pouzyrevsky, Clark J., Koehn P.

We present an efficient algorithm to estimate
large modified Kneser-Ney models
including interpolation. Streaming
and sorting enables the algorithm to scale
to much larger models by using a fixed
amount of RAM and variable amount of
disk. Using one machine with 140 GB
RAM for 2.8 days, we built an unpruned
model on 126 billion tokens. Machine
translation experiments with this model
show improvement of 0.8 BLEU point
over constrained systems for the 2013
Workshop on Machine Translation task in
three language pairs. Our algorithm is also
faster for small models: we estimated a
model on 302 million tokens using 7.7%
of the RAM and 14.0% of the wall time
taken by SRILM. The code is open source
as part of KenLM.

Language: English
Text on another site
Keywords: language model estimationKneser-Ney language modelязыковые модели

In book

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
Vol. 2: Short Papers. , Sofia: Springer, 2013.
Similar publications
Сравнительный анализ методов аспектного анализа тональности текстов
Абрегова З. Х., Dvoynikova A., В кн.: Альманах научных работ молодых ученых Университета ИТМО.: Университет ИТМО, 2025. С. 487–493.
В статье рассматриваются различные методы аспектного анализа тональности текстовых данных, включая как традиционные методы, основанные на правилах, так и современные алгоритмы машинного и глубокого обучения. В работе представлен сравнительный анализ корпусов данных и методов аспектного анализа тональности текстов, который является важной задачей в области обработки естественного языка и анализа данных. ...
Added: April 25, 2026
Grammar in Language Models: BERT Study
Chistyakova K., Kazakova Tatiana, / NRU HSE. Series WP BRP "Linguistics". 2023. No. 115.
The problem of language models’ interpretation is extensively inspected, but no universal answers have been found. Our study offers to combine widely accepted probing methods with a novel approach to a neural network under investigation. We propose to break grammatical forms on the pre-training step in order to get two "sibling" models, as it casts ...
Added: November 29, 2023
Вопросы дистрибутивно-смыслового анализа скелетных структур текстов в задачах автоматизированной обработки языковых данных
Mylnikova A., Mylnikov L., Научно-техническая информация. Серия 2: Информационные процессы и системы 2023 № 5 С. 21–30
Предложен подход к построению скелетных структур текстов на основе дистрибутивного анализа предложений, который состоит в структурировании и формализации языковых единиц и позволяет выявлять уникальные лексико-грамматические дистрибутивные закономерности. Представлена система обозначений и способ формализации данных для обучения модели анализа текста. ...
Added: June 19, 2023
The voice of Twitter: observable subjective well-being inferred from tweets in Russian
Smetanin S., Mikhail Komarov, PeerJ Computer Science 2022 Vol. 8 Article e1181
As one of the major platforms of communication, social networks have become a valuable source of opinions and emotions. Considering that sharing of emotions offline and online is quite similar, historical posts from social networks seem to be a valuable source of data for measuring observable subjective well-being (OSWB). In this study, we calculated OSWB ...
Added: December 29, 2022
Compression of recurrent neural networks for efficient language modeling
Grachev A., Ignatov D. I., Savchenko A., Applied Soft Computing Journal 2019 Vol. 79 P. 354–362
Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long–Short Term Memory models. We make particular attention ...
Added: June 12, 2019
К истокам ритмики русского силлабо-тонического стиха (проблема межъязыковой коммуникации)
Kazartsev (Evgenii Kazartcev) E., Красноперова М. А., Мухин А. С., В кн.: Материалы Международной научной конференции «Языки науки – языки искусства».: М., Суздаль: [б.и.], 2000. С. 353–358.
В данной статье впервые представлен развернутый анализ ранних ямбов М.В.Ломоносов на фоне языковых моделей ритмики русского и немецкого стиха. Получены предварительные данные, позволяющие судить о связи ритмики немецкого языки и оды "На взятие Хотина..." М.В.Ломоносова. ...
Added: March 27, 2015
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit