Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research

J. Poelmans; D. I. Ignatov; S. Viaene; G. Dedene; S. Kuznetsov

?

Text Mining Scientific Papers: A Survey on FCA-Based Information Retrieval Research

P. 273–287.

Poelmans J., Ignatov D. I., Viaene S., Dedene G., Kuznetsov S.

Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.

Language: English

Full text

Text on another site

Keywords: анализ текстов информационный поиск анализ формальных понятий Formal Concept Analysis библиометрия text mining information retrieval bibliometry

In book

Advances in Data Mining. Applications and Theoretical Aspects. 12th Industrial Conference, ICDM 2012, Berlin, Germany, July 13-20, 2012. Proceedings

Vol. 7377. , Berlin, Heidelberg: Springer, 2012.

Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II. (LNCS, volume 16484)

Cham: Springer Publishing Company, 2026.

The four-volume set LNCS 16483-16486 constitutes the refereed conference proceedings of the 48th European Conference on Information Retrieval, ECIR 2026, held in Delft, The Netherlands, during March 29–April 2, 2026. The 46 full papers and 37 short papers presented together with 10 findings papers, 9 reproducibility papers, 17 resource papers, 11 workshop papers, 7 tutorial papers, ...

Added: June 18, 2026

Перспективы медиа-мониторинга в исследованиях общественного мнения (на примере доверия президенту)

Ankudinov I., Социология: методология, методы, математическое моделирование 2025 № 61 С. 165–203

The changing political mood of Russians is a constant subject of interest for sociological agencies. With the development of the Internet, conventional questionnaire research began to be supplemented by online surveys and, despite some skepticism, by social media mining. This article attempts to adjust an accidental web-sample so as to bring its estimates closer to ...

Added: April 22, 2026

SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval

Liakhnovich K., Lashinin O., Babkin A. et al., Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 2025 P. 2754–2758

Relevance and diversity are critical objectives in modern information retrieval (IR), particularly in recommender systems. Achieving a balance between relevance (exploitation) and diversity (exploration) optimizes user satisfaction and business goals such as catalog coverage and novelty. While existing post-processing reranking methods address this trade-off, they usually rely on greedy strategies, leading to suboptimal outcomes for ...

Added: February 3, 2026

Is Canfield Right? On the Asymptotic Coefficients for the Maximum Antichain of Partitions and Related Counting Inequalities

Ignatov D. I., , in: 11th International Conference, AIST 2023, Yerevan, Armenia, September 28–30, 2023, Revised Selected Papers. Analysis of Images, Social Networks and Texts. Lecture Notes in Computer Science (LNCS, volume 14486).: Cham: Springer, 2024. P. 349 – 361.

This paper dates back to the asymptotic solutions of Rota’s problem on the size of maximum antichain in the set partition lattice by Canfield and Harper and others. The knowledge of asymptotic coefficients could pave the way to the asymptotic solutions of such problems as (maximal) antichain counting in partition lattices. In addition to our ...

Added: January 23, 2026

Исследования тюркско-русского билингвизма в русскоязычном научном поле: контрастивный библиометрический анализ

Kolmogorova A., Налобина П. А., Урало-алтайские исследования 2025 № 3 С. 56–83

Статья посвящена библиометрическому анализу российских публикаций по тюркско-русскому билингвизму за 2014—2024 гг. на фоне двух англоязычных публикационных полей: по тюркско-национальному билингвизму и билингвизму в целом. Использованы три корпуса данных: 159 русскоязычных статей из агрегаторов научных статей Elibrary и Cyberleninka, 1453 англоязычных публикации по тюркско-национальному билингвизму и 5500 работ по билингвизму вообще из реферативной базы данных ...

Added: November 12, 2025

Substantive Criteria for Referring Statements from Texts to Events and Factors

I. V. Loginova, A. S. Piekalnits, E. A. Sabidaeva et al., Scientific and Technical Information Processing 2025 Vol. 52 No. 6 P. 738–751

The purpose of this paper is to advance and automate language models for extracting statements related to events and factors from text documents using the designed linguistic marker system. The paper presents the outcomes of text-mining models of events and factors extraction approbation on the example of analytical research in human potential, social sciences and ...

Added: July 18, 2025

Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part I

Springer, 2025.

The five-volume set LNCS 15572, 15573, 15574, 15575 and 15576 constitutes the refereed conference proceedings of the 47th European Conference on Information Retrieval, ECIR 2025, held in Lucca, Italy, during April 6–10, 2025. The 52 full papers, 11 findings, 42 short papers and 76 papers of other types presented in these proceedings were carefully reviewed and selected from 530 submissions. The accepted papers ...

Added: April 17, 2025

Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part IV

Springer, 2025.

Added: April 10, 2025

Библиометрические признаки как символические маркеры дисциплинарных границ: социологическая перспектива

Ivanov D., Deviatko I. F., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 6 С. 27–51

Symbolic markers (as those used in “we-they” attributions) help delineate whether a scientist (among other things) finds themselves at the intersection, within, or outside the confines of a given disciplinary boundary, thereby facilitating the swift navigation across an ever-growing corpus of scientific literature. Frequently, these boundaries are revealed through bibliometric analysis, which makes it possible ...

Added: January 27, 2025

Обзор современных методов и технологий для оценки результативности научных исследований в библиометрии. (Часть 2)

Земсков А. И., Telitsyna A., Научные и технические библиотеки 2024 № 11 С. 48–61

Настоящий обзор представляет значимые изменения в области традиционной библиометрии, а также существенные инновации, происходящие в данной сфере. Одно из наиболее значимых направлений развития – продвижение электронных публикаций и увеличение роли систем открытого доступа. Представлен материал о новых подходах к обеспечению доступа к исходным научным данным. Внимание российских государственных органов к оценкам публикационной активности, а также созданию ...

Added: December 14, 2024

Информационные технологии, компьютерные системы и издательская продукция для библиотек. Сборник докладов двадцать седьмой Международной конференции и выставки «LIBCOM-2023»

[б.и.], 2024.

The conference was attended by approximately 400 participants from Russia, Abkhazia, Belarus, and India. Over the course of four days, more than 30 professional events took place during "LIBCOM-2023." As part of the conference program, the following events were successfully held: the seventh industry conference "Book Publishing and Libraries: Vectors of Interaction" in memory of ...

Added: November 26, 2024

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950

Cham: Springer, 2024.

This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...

Added: November 22, 2024

Обзор современных методов и технологий для оценки результативности научных исследований в библиометрии. (Часть 1)

Зэмсков А., Telitsyna A., Научные и технические библиотеки 2024 № 10 С. 84–101

Added: October 29, 2024

Патриотический дискурс в Рунете: до и после 24 февраля 2022 г.

Ankudinov I., Мониторинг общественного мнения: Экономические и социальные перемены 2024 № 2 С. 153–177

Зафиксированный после 24 февраля 2022 г. патриотический подъем нашел своеобразное отражение в русскоязычном сегменте интернета. Хотя социологи почти единодушны в том, что социальное самочувствие граждан и их отношение к властям улучшились, цифровые следы этих изменений не так заметны: невооруженным взглядом видна только усилившаяся поляризация по линии «за — против». В работе измеряется непосредственный (краткосрочный) эффект, ...

Added: September 7, 2024

Влияние международной помощи на политические риски для прямых иностранных инвестиций

Vladimir B., Вестник МГИМО Университета 2023 Т. 16 № 5 С. 155–188

The last decade has seen an increasing focus on the involvement of the private sector in international sustainable development, particularly in high-risk jurisdictions. This involvement encompasses a broad spectrum, incorporating innovative private sector instruments—now acknowledged as ODA-eligible by the OECD—as well as traditional tools of external official support to developing countries, which remain the primary ...

Added: November 14, 2023

Литературное наследие XIX–XX веков: классификация растровых изображений для интеллектуального анализа и тематического моделирования корпуса рукописных текстов

Penskaja E., Khachaturyan L., Филологические науки. Научные доклады высшей школы 2023 № 5 С. 160–165

The article examines the current trends in workingwith digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster ...

Added: October 30, 2023

On the Number of Maximal Antichains in Boolean Lattices for 𝑛 up to 7

Ignatov D. I., Lobachevskii Journal of Mathematics 2023 No. 44 P. 137–146

We consider two ways how to compute the number of maximal antichains in the Boolean lattice on 𝑛 elements. The first one is based on full direct enumeration, while the second ones relies on concept lattices or Galois lattices (studied in Formal Concept Analysis, an applied branch of lattice theory) and the Dedekind–MacNeille completion of a partial ...

Added: June 13, 2023

Cognitive load measurement during navigation and information retrieval in digital text

Ledneva T., Kovalev A., Procedia Computer Science 2021 Vol. 192 P. 2720–2730

Interaction with digital text permeates practically all types of educational, professional and leisure activities of modern life. The effective working with digital instructions and materials determines the success in solving a number of real problems. At the same time, the negative impact of overly complex digital environments on learning outcomes, work efficiency, and subjective well-being ...

Added: April 27, 2023

АНАЛИЗ СТРУКТУРЫ ВРЕМЕННЫХ РЯДОВ КОЛИЧЕСТВА ДЕЛ В СУДЕ

Lukianchenko P., Gromov V., Beschastnov Y. et al., Вестник кибернетики 2022 Т. 4 № 48 С. 37–48

The study analyzes the time series of the number of new cases in the administrative courts of the Russian Federation using two methods of time series grouping according to the chaotic, stochastic, and regular structure. The first model is based on the entropy‒complexity plane, the second one is presented by the attribute‒object graph. As a result, four groups ...

Added: March 20, 2023

Применение методов анализа формальных понятий для анализа временных рядов тока крови для гемодиализных больных

Gromov V., Урманцева Н. Р., [б.и.], 2021.

В докладе рассматриваются подходы к прогнозированию на основе кластеризации, опирающиеся на методологию анализа формальных понятий. Методология применяется для кластеризации участков временного ряда с целью выделения характерных участков (мотивов), отвечающих больным с различной степенью засорённости фистулы. ...

Added: January 30, 2023

Введение

Polukhina E., В кн.: Практики анализа качественных данных в социальных науках.: М.: Издательский дом НИУ ВШЭ, 2023. С. 8–12.

Введение в книгу "Практики анализа качественных данных в социальных науках" (2023) ...

Added: January 27, 2023