• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications

P. 6462–6468.
Meyer J., Rauchenstein L., Eisenberg J.

We describe the creation of the Artie Bias Corpus, an English dataset of expert-validated <audio, transcript> pairs with demographic tags for age, gender, accent. We also release open software which may be used with the Artie Bias Corpus to detect demographic bias in Automatic Speech Recognition systems, and can be extended to other speech technologies. The Artie Bias Corpus is a curated subset of the Mozilla Common Voice corpus, which we release under a Creative Commons CC0 license – the most open and permissive license for data. This article contains information on the criteria used to select and annotate the Artie Bias Corpus in addition to experiments in which we detect and attempt to mitigate bias in end-to-end speech recognition models. We we observe a significant accent bias in our baseline DeepSpeech model, with more accurate transcriptions of US English compared to Indian English. We do not, however, find evidence for a significant gender bias. We then show significant improvements on individual demographic groups from fine-tuning.

Language: English
Full text
Text on another site
Keywords: speech corpusautomatic speech recognitiondemographic biasbias detection

In book

Proceedings of The 12th Language Resources and Evaluation Conference
Vol. 12. , European Language Resources Association (ELRA), 2020.
Similar publications
Bridging Gaps in Russian Language Processing: AI and Everyday Conversations
Tatiana Sherstinova, Nikolay Mikhaylovskiy, Evgenia Kolpashchikova et al., , in: Proceedings of the 35th Conference of Open Innovations Association FRUCT, 24-26 April 2024, Tampere, FinlandIssue 1.: FRUCT Oy, 2024. P. 253–258.
Contemporary advancements in NLP and neural network techniques are paving the way to enhance and harness traditional linguistic resources and corpora, as well as expand the methods of applying neural networks for complex language material. Thus, a weak point for both theoretical and applied linguistic tasks is the processing of spontaneous everyday speech. Two experiments ...
Added: November 29, 2024
Multiword Units in Russian Everyday Speech: Empirical Classification and Corpus-Based Studies
Natalia V. Bogdanova-Beglarian, Olga V. Blinova, Khokhlova M. et al., , in: Speech and Computer: 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25–28, 2024, Proceedings, Part I.: Springer, 2024. P. 187–200.
Added: November 9, 2024
Speech and Computer: 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25–28, 2024, Proceedings, Part I
Springer, 2024.
The article is dedicated to the results of a research project describing the classes and functioning of multiword units in contemporary Russian everyday speech. The concept of multiword units encompasses quite diverse linguistic phenomena, making the creation of a working typology one of the project's central tasks. This typology is necessary for annotating corpus material ...
Added: November 9, 2024
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Shuranov E., / Series Computer Science "arxiv.org". 2021.
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...
Added: February 14, 2023
Hypernym Information and Sentiment Bias Probing in Distributed Data Representation
Frank Lawrence Acquaye, Latypov I., Attila Kertész-Farkas, , in: ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing.: NY: Association for Computing Machinery (ACM), 2023. P. 221–226.
Neural word embedding vectors have been exhaustively investigated by probing tasks, whether they contain semantic and syn- tactic information. Perhaps the most popular task is a test on gender relation “king - man + woman ≈ queen”, other probings include tests on singular/plural relation (apple∼apples), analogy (good:better∼rough: ), purity of the clusters of word embeddings ...
Added: December 2, 2022
Pragmatic Markers and Parts of Speech: on the Problems of Annotation of the Speech Corpus
Bogdanova-Beglarian Natalia, Zaides K., , in: CEUR Workshop Proceedings (Proceedings of the International Conference "Internet and Modern Society" IMS-2020, 17-20 June 2020, ITMO University, St. Petersburg, Russia).: CEUR Workshop Proceedings, 2020. P. 129–139.
Added: February 3, 2022
Прагматический маркер ИЛИ ТАМ: свой среди чужих, чужой среди своих
Zaides K., Русская речь 2021 № 1 С. 22–36
В статье описываются функции и специфика употребления одного из прагматических маркеров, встречающихся в устной спонтанной речи, – или там. Данный маркер формально схож по модели построения с рефлексивными маркерами – или как его/её/их, или как это, или что и под. Однако, в отличие от этих маркеров, единица или там, как показано в статье, выполняет в устной речи принципиально иные функции – аппроксимативную ...
Added: February 3, 2022
Uncertainty Estimation in Autoregressive Structured Prediction
Andrey Malinin, Gales M., , in: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021). ICLR, 2021.: ICLR, 2021. P. 1–31.
Added: November 1, 2021
Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets
Ryabinin M., Malinin A., Gales M., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 6023–6035.
Added: October 31, 2021
Gender domain adaptation for automatic speech recognition
Sokolov A., Savchenko A., , in: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2021. P. 413–418.
This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conducted experiments with finetuning on the gender-specific test subsets. The obtained word error rate (WER) relatively to the baseline is up to 5% and 3% lower on male ...
Added: September 26, 2021
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Sokolov A., / Series Computer Science "arxiv.org". 2021.
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion ...
Added: November 17, 2020
Позиционные свойства русских апеллятивов: формат описания в речевом корпусе
Blinova O. V., Компьютерная лингвистика и интеллектуальные технологии 2018 Т. 2 № 17(24) С. 96–109
The article suggests a way of modelling the linear position of appellatives in Russian. Under the name «appellatives» are combined the units with similar functions and syntactic properties, namely truncated vocative forms and discursive markers of the type «slushaj» (lit. ‘listen-Imp.2P’). The model assumes distinction between accented and non-accented uses in three positions (initial, middle, ...
Added: November 1, 2020
Russian Pragmatic Markers Database: Developing Speech Technologies for Everyday Spoken Discourse
Sherstinova T., Blinova O. V., Богданова-Бегларян Н. В. et al., , in: Proceedings of the 26th Conference of Open Innovations Association FRUCT.: IEEE, 2020. P. 60–66.
The paper presents recent results obtained within the ongoing project dedicated to the study of Russian pragmatic markers. Pragmatic markers are obligatory elements of natural speech in any language; moreover, they are considered to be functionally important for speech production and overcoming inevitable speech difficulties. A correct understanding of use and functions of pragmatic markers ...
Added: November 1, 2020
Pragmatic Markers Distribution in Russian Everyday Speech: Frequency Lists and Other Statistics for Discourse Modeling
Богданова-Бегларян Н. В., Sherstinova T., Blinova O. V. et al., , in: Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, ProceedingsVol. 11658.: Switzerland: Springer, 2019. P. 433–443.
Pragmatic markers (PMs) are discourse units (words and multiword expressions) with a weakened referential meaning, which perform a variety of pragmatic tasks. For example, in English the common PMs are “well”, “you know”, “I think”, and many others. PMs are integral elements of spoken discourse in every language. According to the results obtained from the ...
Added: October 29, 2019
Audible Paralinguistic Phenomena in Everyday Spoken Conversations: Evidence from the ORD Corpus Data
Sherstinova T., , in: Language, Music and Computing. Second International Workshop, LMAC 2017, St. Petersburg, Russia, April 17–19, 2017, Revised Selected PapersVol. 943.: Switzerland: Springer, 2019. P. 131–145.
Paralinguistic phenomena are non-verbal elements in conversation. Paralinguistic studies are usually based on audio or video recordings of spoken communication. In this article, we will show what kind of audible paralinguistic information may be obtained from the ORD speech corpus of everyday Russian discourse containing long-term audio recordings of conversations made in natural circumstances. This linguistic resource provides rich authentic ...
Added: October 29, 2019
Voice command recognition in intelligent systems using deep neural networks
Sokolov A., Savchenko A., , in: 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI).: IEEE, 2019. Ch. 19 P. 113–116.
In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and out-of-vocabulary words. In addition, we use single arc connected beginning and ending ...
Added: October 21, 2019
Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems
Savchenko L.V., Savchenko A.V., Journal of Communications Technology and Electronics 2019 Vol. 64 No. 3 P. 238–244
In this paper, we studied the phonetic approach for voice processing. A method for automatic recognition of speech signals, in which each quasistationary segment is associated with a fuzzy set of phonemes, was developed. We proposed the operation of the probabilistic triangular norm for fuzzy sets corresponding to the input frame and the nearest reference phoneme. The developed ...
Added: June 7, 2019
Domain-independent Classification of automatic Speech Recognition Texts
Mescheryakova E.I., Nesterenko L.V., , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23).: M.: -, 2017.
Added: January 4, 2019
Linguistic features and sociolinguistic variability in everyday spoken Russian
Bogdanova-Beglarian N., Sherstinova T., Blinova O. et al., , in: Speech and Computer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Volume 10458 LNAI, 2017Vol. 10458: Speech and Computer. 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings.: Springer Publishing Company, 2017. P. 503–511.
The paper reviews the results of the project aimed at describing everyday Russian language and analyzing the special characteristics of its usage by different social groups. The presented study was made on the material of 125,000 words annotated subcorpus of the ORD corpus, which contains speech fragments of 256 people representing different gender, age, professional ...
Added: October 6, 2018
Preparing audio recordings of everyday speech for prosody research: The case of the ORD corpus
Sherstinova T., , in: Speech and Computer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Volume 10458 LNAI, 2017Vol. 10458: Speech and Computer. 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings.: Springer Publishing Company, 2017. P. 623–631.
Studying prosody is important for understanding many linguistic, pragmatic, and discourse phenomena, as well as for solution of many applied tasks (in particular, in speech technologies). Prosody of everyday speech is extremely diverse, demonstrating high interpersonal and intrapersonal variations. Furthermore, natural everyday speech produces a multitude of effects which are hardly possible to obtain in ...
Added: October 5, 2018
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit