• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Parallel corpus approach for name matching in record linkage
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Parallel corpus approach for name matching in record linkage

P. 995–1000.
Zhukov L. E., Sukharev J., Popescul A.

Record linkage, or entity resolution, is an important area of data mining. Name matching is a key component of systems for record linkage. Alternative spellings of the same name are a common occurrence in many applications. We use the largest collection of genealogy person records in the world together with user search query logs to build namematching models. The procedure for building a crowd-sourced training set is outlined together with the presentation of our method. We cast the problem of learning alternative spellings as a machine translation problem at the character level. We use information retrieval evaluation methodology to show that this method substantially outperforms on our data a number of standard well known phonetic and string similarity methods in terms of precision and recall. Our result can lead to a significant practical impact in entity resolution applications.

Language: English
Full text
Text on another site
Keywords: Record LinkageCrowd SourcingMachine Translation

In book

Proceedings of 14th International Conference on Data Mining (ICDM 2014)
Proceedings of 14th International Conference on Data Mining (ICDM 2014)
NY: IEEE Computer Society, 2014.
Similar publications
MuMMy: Multimodal Dataset supporting VLM-based Egyptology Research Assistant
Golyadkin M., Innokentiy Humonen, Rubanova V. et al., , in: MM '25: Proceedings of the 33rd ACM International Conference on Multimedia.: Association for Computing Machinery (ACM), 2025. P. 12875–12881.
We present the first multimodal dataset MuMMy, for developing research assistants that can interpret Egyptian hieroglyphic texts. It pairs images with Gardiner codes, transliteration, and English translation at two levels of granularity. We also evaluate several deep learning pipelines across OCR, transliteration, and translation tasks, revealing the complexity of the domain and the challenges posed ...
Added: November 8, 2025
Crowd Science Workshop: Trust, Ethics, and Excellence in Crowdsourced Data Management at Scale (CSW 2021)
Copenhagen, Denmark: CEUR Workshop Proceedings, 2021.
The second workshop on Crowd Science is organized in conjunction with the 47th International Conference on Very Large Data Bases (VLDB 2021). This workshop is the second in a series of events that has the goal of helping crowdsourcing “transition” from art to science, and tackles the research challenges that we face to make crowdsourcing ...
Added: December 13, 2021
Reflections of syntactic structures in non­autoregressive language models
Плетенев С. А., В кн.: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 16–19 июня 2021 г.)Issue 20.: Russian State University for the Humanitie, 2021.
Added: December 13, 2021
Uncertainty Estimation in Autoregressive Structured Prediction
Andrey Malinin, Gales M., , in: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021). ICLR, 2021.: ICLR, 2021. P. 1–31.
Added: November 1, 2021
Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets
Ryabinin M., Malinin A., Gales M., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 6023–6035.
Added: October 31, 2021
Proceedings of the 3rd Workshop on Neural Generation and Translation
Association for Computational Linguistics, 2019.
This document describes the findings of the Third Workshop on Neural Generation and Translation, held in concert with the annual conference of the Empirical Methods in Natural Language Processing (EMNLP 2019). ...
Added: January 7, 2021
CEUR Workshop Proceedings (Proceedings of the International Conference "Internet and Modern Society" IMS-2020, 17-20 June 2020, ITMO University, St. Petersburg, Russia)
CEUR Workshop Proceedings, 2020.
The International Conference “Internet and Modern Society” (IMS-2020) was initially planned to take place in St. Petersburg, Russia. Due to the spread of COVID-19 and the ban on public events, the conference was held during 17-20 June 2020 in the format of online sessions with a discussion of papers and presentations uploaded in advance. The ...
Added: November 1, 2020
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit