О мерах и метриках релевантности информационного поиска в системах по свойствам неорганических веществ
Information systems play a serious role in modern education, providing an information basis for many disciplines. One of the main tasks in integrating information systems into the educational process is to provide a relevant search for information consolidated from heterogeneous sources. In the field of inorganic chemistry and material science, set-theoretic methods for searching for relevant information are known, which ensure the construction of a sufficiently high-quality response to user requests. However, the problem of quantifying the relevance of information retrieval in this subject area remains open. In this paper we propose a method based on weighted graphs for quantifying the relevance of information retrieval in integrated systems on inorganic substances and materials properties. The vertices of the graph are heterogeneous chemical objects (systems, substances and crystal modifications) on which a metric is determined that estimates the similarity of chemical objects. In metric space, cost definition of the path between the vertices of the graph allows us to evaluate the chemical objects similarity (relevance), that is important in enabling the search for related chemical entities and their properties in the context of an integrated information system that consolidates Russian and foreign resources on inorganic substances properties (www.imet-db.ru). Thus, a relevance metric (introduced as a value inversely proportional to the cost of the graph path) allows, from the material scientist’s point of view, to optimally rank the information that is displayed at the user's request at a single access point to consolidated information resources on inorganic substances properties. In addition to the metric on the graph, a measure is defined that is useful in finding out a complete informational description of a chemical object. The measure is used in the search for all properties of the object available in integrated resources, which is necessary when compiling a complete analytical description of a chemical object.