Организация быстрого поиска без индекса
Классическим подходом к организации информации для последующего быстрого поиска является построение индекса. Однако этот подход имеет несколько недостатков. Индекс необходимо перестраивать и поддерживать в актуальном виде, что затруднительно в случае разрозненной информации, такой как текстовая информация в WEB. Эти недостатки являются следствием того, что индекс является реорганизованной копией индексируемой информации. В данной работе предлагается способ организации информации, для последующего быстрого поиска, в структуру данных без дублирования.
Formal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classication, introduced and detailed in the book of Bernhard Ganter and Rudolf Wille, \Formal Concept Analysis", Springer 1999. The area came into being in the early 1980s and has since then spawned over 10000 scientic publications and a variety of practically deployed tools. FCA allows one to build from a data table with objects in rows and attributes in columns a taxonomic data structure called concept lattice, which can be used for many purposes, especially for Knowledge Discovery and Information Retrieval. The \Formal Concept Analysis Meets Information Retrieval" (FCAIR) workshop collocated with the 35th European Conference on Information Retrieval (ECIR 2013) was intended, on the one hand, to attract researchers from FCA community to a broad discussion of FCA-based research on information retrieval, and, on the other hand, to promote ideas, models, and methods of FCA in the community of Information Retrieval. This volume contains 11 contributions to FCAIR workshop (including 3 abstracts for invited talks and tutorial) held in Moscow, on March 24, 2013. All submissions were assessed by at least two reviewers from the program committee of the workshop to which we express our gratitude. We would also like to thank the co-organizers and sponsors of the FCAIR workshop: Russian Foundation for Basic Research, National Research University Higher School of Economics, and Yandex.
Обоснованы формулы для вычисления меры нечёткости утверждения, понятия, системы классификации, задаваемой нечёткой характеристикой. Уточнены понятия полноты и точности поиска в нечётком случае и выведены расчётные формулы для их оценки. Установлена связь между нечёткостью характеристики и эффективностью поиска. Предложена методика построения кривой парето-оптимальных параметров эффективности поиска. Введено понятие автосовместимости системы классификации и установлена её связь с мерой нечёткости характеристики.
В данной статье представлена концепция характеристического поиска, в основе которого лежат разделение характеристик объекта на основные и второстепенные и представление информации в иерархическом виде. Описаны основные функциональные возможности, которые должны быть реализованы в такой системе. Приводится описание разработанного прототипа системы поиска.
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Doctoral students were invited to the Doctoral Consortium held in conjunction with the main conference of ECIR 2013. The Doctoral Consortium aimed to provide a constructive setting for presentations and discussions of doctoral students’ research projects with senior researchers and other participating students. The two main goals of the Doctoral Consortium were: 1) to advise students regarding current critical issues in their research; and 2) to make students aware of the strengths and weakness of their research as viewed from different perspectives. The Doctoral Consortium was aimed for students in the middle of their thesis projects; at minimum, students ought to have formulated their research problem, theoretical framework and suggested methods, and at maximum, students ought to have just initiated data analysis. The Doctoral Consortium took place on Sunday, March 24, 2013, at the ECIR 2013 venue, and participation is by invitation only. The format was designed as follows: The doctoral students presents summaries of their work to other participating doctoral students and the senior researchers. Each presentation was followed by a plenary discussion, and individual discussion with one senior advising researcher. The discussions in the group and with the advisors were intended to help the doctoral student to reflect on and carry on with their thesis work.
This book constitutes the refereed proceedings of the 12th Industrial Conference on Data Mining, ICDM 2012, held in Berlin, Germany in July 2012. The 22 revised full papers presented were carefully reviewed and selected from 97 submissions. The papers are organized in topical sections on data mining in medicine and biology; data mining for energy industry; data mining in traffic and logistic; data mining in telecommunication; data mining in engineering; theory in data mining; theory in data mining: clustering; theory in data mining: association rule mining and decision rule mining.