Aschern at CheckThat! 2021: Lambda-Calculus of Fact-Checked Claims

?

Aschern at CheckThat! 2021: Lambda-Calculus of Fact-Checked Claims

P. 484–493.

Chernyavskiy A., Ilvovsky D., Nakov P.

We describe our system for the CLEF 2021 CheckThat! Lab Task 2 Subtask A on detecting previously fact-checked claims. We developed a pipeline using TF.IDF, sentence-BERT fine-tuned on the training data, and reranking using LambdaMART and the predicted similarity scores and positions in the ranked list as features. We examined the quality of each model on the validation set and analyzed its contribution to the final result using the trained LambdaMART. The official evaluation ranked our system 1st by a wide margin over other participants and the organizers' baseline.

Language: English

Text on another site

In book

CLEF 2021 Working Notes

CEUR Workshop Proceedings, 2021.

Тактики противостояния фейковой информации и факторы проведения фактчекинга в России

Kuzina L., Popov E., Мониторинг общественного мнения: Экономические и социальные перемены 2026 № 2 С. 170–191

The article examines internet users' tactics for verifying false (fake) information and the factors associated with fact-checking. Working within the framework of the theory of prosumerism and everyday tactics (Michel de Certeau), the authors of the study aim at identifying and describing the arsenal of fact-checking tactics used by the Russian internet audience, and to ...

Added: May 16, 2026

Truth-O-Meter: Handling Multiple Inconsistent Sources Repairing LLM Hallucinations

Galitsky B., Chernyavskiy A., Ilvovsky D., , in: SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval.: Association for Computing Machinery (ACM), 2024. P. 2817–2821.

Large Language Models (LLM) often produce text with incorrect facts and hallucinations. To address this issue, we developed a fact-checking system Truth-O-Meter which verifies LLM results on the Internet and other sources of information to detect wrong claims/facts and proposes corrections for them. NLP and reasoning techniques such as Abstract Meaning Representation and syntactic alignment are ...

Added: May 9, 2024

Semantic Recommendation System for Bilingual Corpus of Academic Papers

Safaryan A., Petr Filchenkov, Yan W. et al., , in: Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary ProceedingsVol. 12602.: Springer, 2021. Ch. 3 P. 22–36.

We tested four methods of making document representations cross-lingual for the task of semantic search for the similar papers based on the corpus of papers from three Russian conferences on NLP: Dialogue, AIST and AINL. The pipeline consisted of three stages: preprocessing, word-by-word vectorisation using models obtained with various methods to map vectors from two ...

Added: September 18, 2023

CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

Hardalov M., Chernyavskiy A., Koychev I. et al., , in: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).: Association for Computational Linguistics, 2022. P. 266–285.

While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. ...

Added: May 21, 2023

Moving Other Way: Exploring Word Mover Distance Extensions

Smirnov, I., Yamshchikov I. P., , in: COMPLEXIS 2022. Proceedings of the 7th International Conference on Complexity, Future Information Systems and Risk. April 23-24, 2022.: Science and Technology Publications, Lda, 2022. P. 92–97.

Added: September 8, 2022

К вопросу об исследовании спорных истин в американском политическом дискурсе

Казаков И. В., В кн.: Апрельские тезисы: материалы междисциплинарной научно-исследовательской конференции (г. Пермь, 2–3 апреля 2021 г.).: Пермь: Пермский государственный национальный исследовательский университет, 2021. С. 129–135.

Because of growing public concerns about the quality of the information presented by various entities as factual, the practice of fact-checking has spread in the United States. Fact-checking has had a limited effect because its methodology ignores the performative functions of political text. The article proposes to apply a post-structuralist discursive historical approach to answer to ...

Added: May 17, 2022

Rethinking Crowd Sourcing for Semantic Similarity

Solomon S., Cohn A., Rosenblum H. et al., / Series Computer Science "arxiv.org". 2021.

Estimation of semantic similarity is crucial for a variety of natural language processing (NLP) tasks. In the absence of a general theory of semantic information, many papers rely on human annotators as the source of ground truth for semantic similarity estimation. This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators ...

Added: December 3, 2021

Lexicographic Study of Synonymy: Clarifying Semantic Similarity between Words

Solovyev V., Гималетдинова Г., Халитова Л. et al., Computacion y Sistemas 2021 Vol. 25 No. 3 P. 667–675

The problem of determining semantic similarity between words affects the understanding of synonymy 13 and creates obstacles to the work of lexicographers. The study was carried out as a part of a larger 14 research project on expert assessment of synonymic rows in RuWordNet thesaurus (a WordNet–like 15 thesaurus for the Russian language). The aim ...

Added: December 1, 2021

Native Language Identification for Russian

Remnev N., , in: 2019 International Conference on Data Mining Workshops (ICDMW).: IEEE, 2019. P. 1–7.

The task of recognizing the author’s native language based on a text (Native Language Identification - NLI) is the task of automatically recognizing native language (L1) based on texts written in a language that is not native to the author. The NLI task was studied in detail for the English language, and two shared tasks ...

Added: October 18, 2021

Native Language Identification For Russian Using Errors Types

Remnev N., , in: Компьютерная лингвистика и интеллектуальные технологии: по материалам ежегодной международной конференции «Диалог» (Москва, 17–20 июня 2020 г.)Issue 19(26): дополнительный том.: -, 2020. P. 1123–1133.

The task of recognizing the author’s native (Native Language Identification—NLI) language based on a texts, written in a language that is non-native to the author—is the task of automatically recognizing native language (L1). The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 and 2017, where ...

Added: October 18, 2021

Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric

Yamshchikov I. P., Shibaev V., Khlebnikov N. et al., , in: The Thirty-Fifth AAAI Conference on Artificial Intelligence. Technical Tracks 16Vol. 35. Issue 16.: AAAI Press, 2021. P. 14213–14220.

The rapid development of such natural language processing tasks as style transfer, paraphrase, and machine translation often calls for the use of semantic similarity metrics. In recent years a lot of methods to measure the semantic similarity of two short texts were developed. This paper provides a comprehensive analysis for more than a dozen of ...

Added: July 22, 2021

Recursive Neural Text Classification Using Discourse Tree Structure for Argumentation Mining and Sentiment Analysis Tasks

Chernyavskiy A., Ilvovsky D., , in: Foundations of Intelligent Systems. 25th International Symposium on Methodologies for Intelligent Systems: ISMIS 2020Vol. 12117.: Springer, 2020. P. 90–101.

This paper considers sentiment classification of movie reviews and two argument mining tasks: verification of political statements and categorization of quotes from an Internet forum corresponding to argumentation (factual or emotional). In the case of the fact-checking problem, justifications can be used additionally in one of its sub-tasks. A strong model for solving these and ...

Added: October 4, 2020

Semantic Proximity Establishment in the Tasks of Knowledge Extraction and Named Entities Recognition

Kozerenko E. B., Kuznetsov K. I., Morozova Y. I. et al., , in: PROCEEDINGS OFTHE 2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE.: American Council on Science & Education, 2017. P. 339–344.

The paper deals with the problem of establishing text segments containing the similar semantic units for the tasks of analytical text processing within the semantic technology platform. The methods and instruments presented in the paper provide the discovery of relevant content based on users' focused interests within a certain domain. The hybrid approach comprising linguistic ...

Added: February 23, 2018

Automatization of Scientific Articles Classification According to Universal Decimal Classifier

Romanov A., Lomotin K.E., Kozlova E.S., , in: Supplementary Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2017), Moscow, Russia, July 27-29, 2017Vol. 1975.: Aachen: CEUR-WS.org, 2017. P. 122–133.

This research examines the problems of automatic scientific articles classification according to Universal Decimal Classifier. To reveal the structure of the train data its visualization was obtained using the recursive feature elimination algorithm. Further; the study provides a comparison of TF-IDF and Weirdness – two statistic-based metrics of keyword significance. The most efficient classification methods ...

Added: November 28, 2017

Trend Monitoring for Linking Science and Strategy

Bakhtin P. D., Saritas O., Chulok A. et al., Scientometrics 2017 Vol. 111 No. 3 P. 2059–2075

Rapid changes in Science & Technology (S&T) along with breakthroughs in products and services concern a great deal of policy and strategy makers and lead to an ever increasing number of Foresight and other types of forward-looking work. At the outset, the purpose of these efforts is to investigate emerging S&T areas, set priorities and ...

Added: December 21, 2016

Применение меры tf-idf и меры странности для выделения ключевых слов при классификации текстов научных статей

Козлова Е. С., Romanov A., В кн.: Информатика, математика, автоматика: 2016. Материалы научно-технической конференции.: Сумы: СумДу, 2016. С. 42–42.

В рамках исследования используются две меры для выделения ключевых слов в наборе текстов: tf-idf и weirdness (мера странности). В исследовании используется выборка из более чем двадцати двух тысяч научных статей из девяти тем УДК. Задача исследования состояла в выделении оптимального набора слов для быстрой классификации заданного текста. ...

Added: June 11, 2016

Применение искусственной нейронной сети для рубрикации научных статей по УДК

Ломотин К. Е., Romanov A., В кн.: Информатика, математика, автоматика: 2016. Материалы научно-технической конференции.: Сумы: СумДу, 2016. С. 43–43.

Использование искусственных нейронных сетей (ИНС) для решения задач классификации позволяет разделить такие сложные классы образов, какими являются темы классификатора УДК. Для проведения исследования нами выбран классификатор гиперплоскостной группы, реализованный в виде многослойного персептрона Розенблатта. ...

Added: June 11, 2016