Инструментальная система на базе языка шаблонов LSPL: новые средства и приложения
New technological trends identification is one of the most sophisticated, as well as the most important, tasks in the domain of S&T analysis. Nowadays, the leading methodologies within the domain are focused mainly on technological roadmapping, Foresight, data patterns and time series analysis, which is used to specify current and projected trends. The paper presents intelligent tools for trend identification in texts collections with hybrid approach based on the integration of classical statistical methods and the methods of information extraction. Several existing approaches are combined to be used for multilingual text collections of various genres. Ontologies driving text processing as well as documents’ characteristic vectors containing multiword terms are used. The results of statistical analysis of document collections are presented in the form of data patterns time series that are analyzed with structural methods of image analysis. OWL representation of extensional part of the trend ontological model is generated.
The work deals with the the NLP sysytem Alex. Alex is a system of multipurpose text analysis which provides content-analysis as well as the information extraction. It is based on hierarchally organized tempates for text annotation. The system architecture is discussed and several cases of real useage are given.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The article is devoted to the overview of the basic properties of Named Entities Recognition (NER) system based on users’ dictionaries. The NER module is used in many applications. One of the promising applications is the usage of NER systems in order to enhance structured Semantic Web data (for instance, Linked Open Data ontologies) with the information extracted from unstructured texts. The focus of the paper is the methods of ambiguity resolution based on dictionaries and heuristic rules. The dictionary-oriented approach is motivated by the set of strict initial requirements. Firstly, the target set of Named Entities should be extracted with very high precision. Secondly, the system should be easily adapted to a new domain by non-specialists. Thirdly, these updates should result in the same high precision. We focus on the architecture of the dictionaries and on the properties that the dictionaries should have for each class of Named Entities. This serves to resolve ambiguous situations. The properties and structure of synonyms and context words, expressions and entities necessary for disambiguation are discussed.