Математика программных систем: межвузовский сборник научных трудов
The article describes the implementation of the service, which allows to automate collection of structured information from unstructured web documents. The service unifies the solution for a variety of data domain by explicitly ontological description of a task. In addition, is not required change program code to increase the number of sources, because sources of information are also described by ontology.
The paper deals with the classification of formats. Particular attention is paid to possibility of incorporating metadata, which supports mechanism for semantic indexing. Existing classification is examined and a new facet classification is proposed.