Интеллектуальный сервис анализа Интернет-контента на основе описания предметной области
The article describes the implementation of the service, which allows to automate collection of structured information from unstructured web documents. The service unifies the solution for a variety of data domain by explicitly ontological description of a task. In addition, is not required change program code to increase the number of sources, because sources of information are also described by ontology.
This volume contains the papers selected for presentation at the 2014 IEEE/WIC/ACM International Conference on Web Intelligence (WI'14), held as part of the 2014 Web Intelligence Congress (WIC'14) at the University of Warsaw, Warsaw, Poland, from 11 to 14 in August, 2014. The conference was sponsored and co-organized by the IEEE Computer Society, the Web Intelligence Consortium (WIC), Association for Computing Machinery (ACM), the University of Warsaw, Polish Mathematical Society and Warsaw University of Technology.
The series of Web Intelligence conferences was started in Japan in 2001. Since then, it has been held yearly in several countries, including: Canada, China, France, USA, Australia and Italy. It is recognized as the World's leading forum focusing on the role of Web Intelligence as one of the most important directions for scientific research and development of solutions that contribute to creation of the Knowledge-based Society. In 2014, WI visited Poland as a special event commemorating the 25th anniversary of the Web.
WI'14 received 242 paper submissions, in the areas of foundations of Web Intelligence, semantic aspects of Web Intelligence, World Wide Wisdom Web, Web search and recommendation, Web mining and warehousing, Human-Web interaction, as well as Web Intelligence technologies and applications. After a rigorous evaluation process, 85 papers were selected as regular contributions, giving an acceptance rate of 35.1%.
The first five sections of this volume include 40 regular contributions. Additionally, the first paper in the first section corresponds to one of WIC'14 keynotes. The last four sections of this volume contain 23 papers selected for oral presentations in WI'14 workshops. The remaining 45 regular contributions and 25 papers accepted to WI'14 special sessions are published in another volume of WI’14 proceedings.
Formal Concept Analysis (FCA) is a mathematical technique that has been extensively applied to Boolean data in knowledge discovery, information retrieval, web mining, etc. applications. During the past years, the research on extending FCA theory to cope with imprecise and incomplete information made significant progress. In this paper, we give a systematic overview of the more than 120 papers published between 2003 and 2011 on FCA with fuzzy attributes and rough FCA. We applied traditional FCA as a text-mining instrument to 1072 papers mentioning FCA in the abstract. These papers were formatted in pdf files and using a thesaurus with terms referring to research topics, we transformed them into concept lattices. These lattices were used to analyze and explore the most prominent research topics within the FCA with fuzzy attributes and rough FCA research communities. FCA turned out to be an ideal metatechnique for representing large volumes of unstructured texts.
This book constitutes a collection of selected contributions from the 12th International Conference on Perspectives in Business Informatics Research, BIR 2013, held in Warsaw, Poland, in September 2013. Overall, 54 submissions were rigorously reviewed by 41 members of the Program Committee representing 21 countries. As a result, 19 full and 5 short papers from 12 countries have been selected for publication in this volume. This book also includes the two keynotes by Witold Abramowicz and Bernhard Thalheim. The papers cover many aspects of business information research and have been organized in topical sections on: business process management; enterprise and knowledge architectures; organizations and information systems development; information systems and services; and applications.
The paper describes the development of a portal about development and use of tools based on the (meta) modeling (using DSM, DSL, etc.). The architecture of a portal, information retrieval subsystem and document management are described.
The purpose of the portal is the creation of "selfdeveloping" resource, which provides intelligent search and automatic processing of the results (documents and sources), easy navigation on the found resources. Implementation is based on the ontologies approach.
The main feature of suggested methods is an integrated approach to development. The approach bases on a multi-level ontology repository. The portal allows searching and analyzing information, creating and researching model, publishing research results. Software gives an opportunity of a flexible customizing. The main topic of this paper is an intelligent information search means based on semantic indexation, automatic document classification, tracking of semantic links between documents and automatic summarization.
Today many problems that are dedicated to a particular problem domain can be solved using DSL. Thus to use DSL it must be created or it can be selected from existing ones. Creating a completely new DSL in most cases requires high financial and time costs. Selecting an appropriate existing DSL is an intensive task because such actions like walking through every DSL and deciding if current DSL can handle the problem are done manually. This problem appears because there are no DSL repository and no tools for matching suitable DSL with specific task. This paper observes an approach for implementing an automated detection of requirements for DSL (ontology-based structure) and automated DSL matching for specific task.
Today web spam is the one of the key problems of modern web search engines. In this paper we investigate the efficiency of various dimensionality reduction methods applying to the spam classifier of go.mail.ru search system. Effective utilization of such techniques can significantly increase the number of features and the quality of the classifier without loss of training and classification speed. We have conducted a series of experiments with PCA (Principal Component Analysis) и RP (Random Projection) dimensionality reduction methods. Unfortunately, these methods are shown to be ineffective applying to such issues, basically because of low-dimensional feature space. However this experiment led to the need for a detailed analysis of features, participating in the education process. For this analysis, we have chosen MRMR (Minimum Redundancy Maximum Relevance) criterion. Application of this criterion has allowed us to detect redundant features and estimate the efficiency of each of participating in education process feature. This research has allowed us significantly increase the quality of our web spam classifier without increasing number of features. This paper shows us all the efficiency of feature selection criterions in practice, and once again emphasizes the importance of a detailed analysis of the data and informative features, which are selected for training.
Usage of visual domain-specific languages in software engineering allows to simplify the process of software creation and to attract to it the experts in domain, who are not professional programmers. However creation new domain-specific language is the nontrivial task, therefore the problem of automation of their development process is the topical task. For the automation, designing of visual modeling languages it is offered to use the ontologies received as a result of the analysis of text corpus. In article, the approach to automatic creation of visual modeling languages on the basis of domain ontologies is considered.