Развитие модели, основанной на знании об авторах, для поисковых применений

В. О. Молоканов; Д. А. Романов; В. В. Цибульский

?

Развитие модели, основанной на знании об авторах, для поисковых применений

Молоканов В. О., Romanov D. A., Цибульский В. В.

A new technology is proposed for wide search applications to natural language texts. Its particular application to an expert search task is considered in details on the example of TREC Enterprise track. The vocabulary is treated statistically, but, as opposed to a standard TFIDF metric, two special metrics are used. They involve into calculations information about lexicon usage by authors and communications between them. Calculating connection cardinality between an author and lexicon enables to reveal definite terms which are characteristic for an author so this author can be found with the help of such terms. Lexicon weighing allows to extract from the whole collection a small portion of vocabulary which we name significant. The significant lexicon enables to effectively search in thematically specialized knowledge field. Thus, our search engine minimizes the lexicon necessary for answering a query by extracting the most important part from it. The ranking function takes into account term usage statistics among authors to raise role of significant terms in comparison with others, more noisy ones. We demonstrate the possibility of effective expertise retrieval owing to several rationally built heuristic rating indicators. First, we receive an expert search efficiency that is comparable with the most effective modern information retrieval engines. Second, the chosen indicators allow to distinguish between “good” and “bad” queries. This is essentially important for further optimization of our engine. We discuss the possibility of applying our engine to other search and analytic scenarios such as plagiarism search, information gap retrieval and others.

Language: Russian

Full text

Text on another site

Keywords: expert search large-scale enterprise collections network communications ranking algorithms поиск экспертов корпоративные коллекции большого объема сетевые коммуникации алгоритмы ранжирования

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.

Т. 1: Основная программа конференции. Вып. 12 (19). , М.: РГГУ, 2013.

CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

Hardalov M., Chernyavskiy A., Koychev I. et al., , in: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).: Association for Computational Linguistics, 2022. P. 266–285.

While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. ...

Added: May 21, 2023

Further development of automated methods for predicting risks and expert searching of weakly structured processes

Romanov D. A., Bilinkis (Stavenko) J., Zueva A. et al., , in: Proceedings of the International Forum on Knowledge Asset Dynamics, 12th edition, Knowledge Management in the 21st Century: Resilience, Creativity and Co-creation, IFKAD 2017.: St. Petersburg: St. Petersburg University, 2017. P. 1804–1814.

Purpose – Companies are adopting new methods of analysis of business processes to maintain an adequate level of control and transparency. If the process is subject to constant changes and becomes too complicated, there is a need to move from a linear to its non-linear processing. Lack of process structure is a source of increasing ...

Added: February 23, 2018

Business Performance Measurement and Management

Cambridge Scholars Publishing, 2014.

Measuring and managing the performance of a business is one of the main requirements of the management of any organization. This book introduces new contexts and themes of application and presents emerging research areas related to business performance measurement and management. It draws authors from all around the globe from a variety of functional disciplines, ...

Added: August 24, 2015

Использование семантического анализа текстов для поиска специалистов

Zakhlebin I. V., В кн.: Supplementary Proceedings of the 3rd International Conference on Analysis of Images, Social Networks and Texts (AIST 2014)Vol. 1197: Supplementary Proceedings of AIST 2014.: Ekaterinburg: CEUR Workshop Proceedings, 2014. С. 187–191.

This paper presents a semantic method for searching for the experts. The method operates over a set of texts authored by themselves. The query format allowing one to define a set of the selected skills, and the algorithms for constructing and comparing the semantic representations are also presented. The ExpSearch-1 (Experts Search, version 1) system ...

Added: July 11, 2015

Зачем нам нужны технологии поиска и анализа неструктурированной информации? Как оценить экономический эффект? (Часть 3)

Romanov D. A., Современные технологии делопроизводства и документооборота 2015 № 2 С. 13–23

В третьей части статьи рассмотрены факторы, влияющие на экономическую эффективность систем и приложений, использующих технологии поиска и анализа неструктурированной информации. Рассмотрены расходные и доходные часть бюджетов инвестиционных проектов на примере корпоративной системы поиска экспертов и системы проведерения правовой экспертизы. Приведен пример анкеты для сбора информации при обследовании бизнес-процессов организации. ...

Added: March 19, 2015

Зачем нам нужны технологии анализа и поиска неструктурированной информации? Как оценить экономический эффект? (часть 2)

Romanov D. A., Современные технологии делопроизводства и документооборота 2015 № 1 С. 6–15

Во второй части статьи рассматривают различные корпоративные информационые системы и прикланые сервисы, использующие технологии анализа и поиска неструктурированной информации: корпоративная поисковая система, мониторинг СМИ и бизнес-разведка, система поиска экспертов, выявление узких мест в бизнес-процессах, поиск плагиата, пдбор резюме, анализ обращений граждан и маршрутизация документов, правовая экспертиза и т.п. ...

Added: March 19, 2015

Enhanced Algorithms for Enterprise Expert Search System

Valentin Molokanov, Dmitry Romanov, Valentin Tsibulsky, , in: Proceedings of SPIE* 2: International Conference on Graphic and Image Processing (ICGIP 2012). Vol. 8768.: Singapore: SPIE, 2013. Ch. 8768-36 P. 146–150.

We present the results of our enterprise expert search system application to the task introduced at the Text Retrieval Conference (TREC) in 2007. The expert search system is based on analysis of content and communications topology in an enterprise information space. An optimal set of weighting coefficients for three query-candidate associating algorithms is selected for ...

Added: February 10, 2014

A New Model for Enterprise Expert Retrieval

Valentin O. Molokanov, Dmitry A. Romanov, Valentin V. Tsibulsky, International Journal of Computer and Communication Engineering 2013 Vol. 2 No. 2 P. 201–205

We present a description of an enterprise expert search system which is based on the analysis of content and communications topology in an enterprise information space. As data sources we use the collections introduced at the Text Retrieval Conference (TREC) in 2006 and 2007. An optimal set of weighting coefficients for three query-candidate associating algorithms ...

Added: February 10, 2014