An Experimental Study of Term Extraction for Real Information-Retrieval Thesauri

Natalia Loukachevitch; M. Nokel

?

An Experimental Study of Term Extraction for Real Information-Retrieval Thesauri

P. 69–76.

Natalia Loukachevitch, Nokel M.

Models for effective term extraction can depend
on the type of a terminological resource
under construction. In this paper
we study term extraction models for realworking
information-retrieval thesauri. The
first thesaurus is the English version of EuroVoc
thesaurus, the second one is the Russian
Banking thesaurus. We study singleword
and two-word term extraction separately
to reveal the best features and feature
combinations, compare best models for
two thesauri. In particular, we found for this
type of terminological resources the use of
association measures does not improve the
quality of two-word term extraction based
on combining multiple features.

Language: English

Text on another site

Keywords: тезаурус извлечение терминов information retrieval term extraction thesauri

In book

Proceedings 10th International Conference on Terminology and Artificial Intelligence TIA 2013

P.: Université Paris 13 - Paris Sorbonne Cité, 2013.

Выделение терминов и их связей для предметного указателя научного текста

Bolshakova E. I., Иванов К. М., В кн.: Шестнадцатая Национальная конференция по искусственному интеллекту с международным участием КИИ-2018 (24-27 сентября 2018 г., г.Москва, Россия). Труды конференции. В 2-х томахТ. 1. М.: РКП, 2018. С. 253–261.

Предметный указатель – список значимых терминов текстового документа с указанием страниц, на которых они употребляются. В работе описываются методы на основе лексико-синтаксических шаблонов и правил, разработанные для автоматического извлечения и отбора терминов в предметный указатель заданного научного текста, а также для выявления их подчинительных связей. ...

Added: December 10, 2018

Combining multiple features for single-word term extraction

Nokel M.A., Bolshakova E.I., Loukachevitch N.V., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 30 мая–3 июня 2012 г.). В 2 томахТ. 2: Доклады специальных секций. Вып. 11. М.: Российский государственный гуманитарный университет, 2012.

The paper describes experiments on automatic single-word term extraction based on combining various features of words, mainly linguistic and statistical, by machine learning methods. Since single-word terms are much more difficult to recognize than multi-word terms, a broad range of word features was taken into account, among them are widely-known measures (such as TF-IDF), some novel features, as well as ...

Added: October 1, 2014

HCI International 2023 Posters

Springer, 2023.

Added: October 21, 2023

Тезаурусы и персональные модели: портрет С.М. Волконского (часть I)

Vostrova G. A., Культурное наследие России 2016 № 2 (13) апрель-июнь С. 18–26

Статья посвящена одному из самых ярких и загадочных персонажей эпохи Серебряного века – театральному деятелю, художественному критику, эстетику, прозаику, мемуаристу, музыканту, теоретику актерской техники и ритмической гимнастики - С.М. Волконскому. В ней предлагается первый опыт систематизации материалов по изучению феномена Волконского на основе тезаурусного подхода. Представленные материалы состоят из фиксации биографических событий и краткого обзора-комментария основных ...

Added: February 6, 2017

Fuzzy and rough formal concept analysis: a survey

Poelmans J., Ignatov D. I., Kuznetsov S. et al., International Journal of General Systems 2014 Vol. 43 No. 2 P. 105–134

Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the ...

Added: June 9, 2014

Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014)

Ekaterinburg: CEUR Workshop Proceedings, 2014.

AIST'2014 is an international data science conference on Analysis of Images, Social Networks, and Texts. Traditionally, the conference is held annually in Yekaterinburg, Russia. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science. LIST OF TOPICS (NON EXHAUSTIVE) Applications of Data Mining and Machine ...

Added: August 28, 2014

Concept-based chatbot for interactive query refinement in product search

Goncharova E., Ilvovsky D., Galitsky B., , in: Proceedings of the 9th International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2021)Vol. 2972. CEUR-WS, 2021. P. 51–58.

Added: October 28, 2021

Метод извлечения технических терминов с использованием усовершенствованной меры странности

Kochetkova N. A., Научно-техническая информация. Серия 2: Информационные процессы и системы 2015 № 5 С. 25–32

Представлен метод извлечения терминов из текстов технической области, основанный на новой мере терминологичности. Для отбора кандидатов в термины используются морфологические ограничения. Приводятся результаты экспериментов на корпусе текстов по тематике «Системы автоматизации проектирования и компьютерная графика». ...

Added: December 15, 2015

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2020.

Added: October 4, 2020

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2022.

Added: July 8, 2022

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

Mokrii I., Boytsov L., Braslavski P., , in: SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2021. P. 2081–2085.

Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a ...

Added: August 11, 2021

К ВОПРОСУ О ФОРМИРОВАНИИ ТЕЗАУРУСНОЙ КОМПЕТЕНЦИИ УСТНОГО ПЕРЕВОДЧИКА

Moshchanskaya T., Мощанская Е. Ю., Современные проблемы науки и образования 2015 № 6

Abstract: The article deals with the development of interpreter’s thesaurus competence for situations of professional communication based on discourse-pragmatic and interdisciplinary approaches. The specificity of thesaurus competence formation for specialists and translators is described. The strategies of terms translation during consecutive interpretation of a training seminar in the field of emergency medicine are presented – the ...

Added: February 28, 2016

Русская бытовая предметная лексика: онтология и описание

Iomdin B., В кн.: Сборник статей конференции “Информационные технологии и системы” (ИТиС'10). М.: ИППИ РАН, 2010.

Бытовая предметная лексика - слова, называющие артефакты, регулярно используемые большинством городских жителей независимо от их профессиональной и социальной принадлежности - ставит перед исследователем и лексикографом интересные и сложные задачи. Описывается проект словаря-тезауруса бытовой терминологии русского языка, который начала составлять группа молодых исследователей под руководством автора работы. Приводятся и анализируются первые результаты анкетирования инфор- мантов и ...

Added: April 19, 2013

Annotated suffix tree as a way of text representation for information retrieval in text collections

Dmitry S. Frolov, Business Informatics 2015 No. 4 P. 63–70

A method for information retrieval based on annotated suffix trees (AST) is presented. The method is based on a string-to-document relevance score calculated using AST as well as fragment reverse indexing for improving performance. We developed a search engine based on the method. This engine is compared with some other popular text aggregating techniques: probabilistic ...

Added: December 13, 2016

Using Annotated Suffix Trees for Fuzzy Full Text Search

Dmitry Frolov, , in: Communications in Computer and Information Science. Information Retrieval. 10th Russian Summer School, RuSSIR 2016, Saratov, Russia, August 22-26, 2016, Revised Selected Papers. Springer, 2016.

A method for fuzzy full text search is proposed. The method follows a popular two-stage scheme with a novel second stage: a prelim inary search stage using an n-gram inverted index and, at the second stage, relevance checking between the query and documents using fre quency annotated suffix trees (ASTs). The ASTs are built for all docu ments of the ...

Added: December 13, 2016

Experimental IR Meets Multilinguality, Multimodality, and Interaction: 12th International Conference of the CLEF Association, CLEF 2021, Virtual Event, September 21–24, 2021, Proceedings

Springer, 2021.

Added: September 28, 2021

Критерий MRMR и уменьшение размерности пространства признаков в задаче классификации спама поисковой системы

Belov A. V., karbachinsky I. O., Качество. Инновации. Образование 2014 № 6 С. 24–32

Today web spam is the one of the key problems of modern web search engines. In this paper we investigate the efficiency of various dimensionality reduction methods applying to the spam classifier of go.mail.ru search system. Effective utilization of such techniques can significantly increase the number of features and the quality of the classifier without ...

Added: February 2, 2015

Formalization of Medical Records Using an Ontology: Patient Complaints

Klyshinskiy E., Gribova V., Shakhgeldyan C. et al., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 143–153.

Added: October 26, 2019

Data Analytics and Management in Data Intensive Domains. 23rd International Conference, DAMDID/RCDL 2021, Moscow, Russia, October 26–29, 2021, Revised Selected Papers

Springer, 2022.

“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...

Added: August 30, 2021

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2017.

Added: November 9, 2018

9th Russian Summer School in Information Retrieval (RuSSIR 2015)

Braslavski P. undefined., Markov I., Pardalos P. M. et al., ACM SIGIR Forum 2016 Vol. 49 No. 2 P. 72–79

This paper provides the reader with a report on 9th Russian Summer School in Information Retrieval (RuSSIR 2015). ...

Added: February 27, 2017

A Method for Extracting Technical Terms Using the Modified Strangeness Measure

Kochetkova N. A., Automatic Documentation and Mathematical Linguistics 2015 Vol. 49 No. 3 P. 89–95

A method for extracting terms from technical texts based on a new terminology measure is described. Morphological constraints are used to select term candidates. The results of experiments based on a corpus of texts in the area of computeraided design systems and computer graphics are described. ...

Added: December 15, 2015

Извлечение однословных терминов из текстовых коллекций на основе методов машинного обучения

Bolshakova E. I., Loukachevitch N. V., Нокель М., Информационные технологии 2013 № 7 С. 31–37

В статье представлены результаты экспериментов по автоматическому извлечению однословных терминов из русскоязычных текстов на основе машинного обучения, позволяющего комбинировать применяемые статистические и лингвистические признаки терминов. Эксперименты показывают, что комбинирование значительно улучшает результаты извлечения терминов, а найденная комбинация признаков может быть использована на расширенной текстовой коллекции без значительной потери качества. ...

Added: November 16, 2013

Texterra: инфраструктура для анализа текстов

Денис Турдаков, Астраханцев Н. А., Недумов Я. Р. et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421–438

he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...

Added: November 6, 2017