Морфосинтаксическая разметка текста на китайском языке с помощью статистических анализаторов: методика, оценка качества.
In this paper, we describe basic principles of POS-classifications and their modelling for POS-tagging of Chinese and statistical NLP systems. Using three available statistical POS-taggers, we conducted an experiment on POS-tagging of Chinese text to analyze quality evaluation, correspondence between POS-tags and categories assigned in different reference grammars. We also determine the basic rules of POS-taggers tagset evaluation.
Proceeding of the 15th International Conference on Artificial Intelligence: Methodology, Systems, Applications , AIMSA 2012, Varna, Bulgaria, September 12-15, 2012.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
Compared with the area of spatial relations force interactions haven’t been in the limelight of attention of ontologists working on natural language processing. This article gives an example of text meaning representation based on the ontology and the lexicon of force interactions.
In this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, describe the attitude of the author towards certain opinion targets, i.e., entities and their attributes on which opinions have been expressed. Hence, the availability of a representative opinion lexicon can facilitate the extraction of opinions from texts. For this reason, opinion word mining is one of the key issues in sentiment analysis. We designed and implemented several methods for extracting opinion words. We evaluated these approaches by testing how well the resulting opinion lexicons help improve the accuracy of methods for determining the polarity of the reviews if the extracted opinion words are used as features. We used several machine learning methods: SVM, Logistic Regression, Naive Bayes, and KNN. By using the extracted opinion words as features we were able to improve over the baselines in some cases. Our experiments showed that, although opinion words are useful for polarity detection, they are not su fficient on their own and should be used only in combination with other features.
This book is a collection of articles dealing with various aspects of grammatical relations and argument structure in the languages of Europe and North and Central Asia (LENCA). Topics covered with respect to individual languages are: split-intransitivity (Basque), causativization (Agul), transitives and causatives (Korean and Japanese), aspectual domain and quantification (Finnish and Udmurt), head-marking principles (Athabaskan languages), and pragmatics (Eastern Khanty and Xibe). Typology of argument-structure properties of ‘give’ (LENCA), typology of agreement systems, asymmetry in argument structure, typology of the Amdo Sprachbund, spatial realtors (Northeastern Turkic), core argument patterns (languages of Northern California), and typology of grammatical relations (LENCA) are the topics of articles based on cross-linguistic data. The broad empirical sweep and the fine-tuned theoretical analysis highlight the central role of argument structure and grammatical relations with respect to a plethora of linguistic phenomena.
The form whose main function is to express indirect commands, called the third person Imperative, Jussive or Exhortative, when compared to the prototypical (second person) Imperative, shows semantic and formal similarities and distinctions at the same time. The study describes formal and functional patterns of Jussive and places this category within the typology of the related categories, such as Imperative and Optative, based on data from six East Caucasian languages (Archi, Agul, Akhvakh, Chechen, Icari and Kumyk). Five formal patterns of Jussive are attested in these languages, including a specialized form, constructions derived from want, from tell him to do and from make him do and the Optative. Jussive forms may express such meanings as third person command, indirect causation, permission, indifference towards the accomplishment of an action and an assumption. While the Jussive is crucially different from the second person Imperative in that it introduces a third participant, this article shows that it is the addressee, not a third person, who is the central participant of a Jussive situation from both formal and functional points of view.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
Software system Cordiet-FCA is presented, which is designed for knowledge discovery in big dynamic data collections, including texts in natural language. Cordiet-FCA allows one to compose ontology-controlled queries and outputs concept lattice, implication bases, association rules, and other useful concept-based artifacts. Efficient algorithms for data preprocessing, text processing, and visualization of results are discussed. Examples of applying the system to problems of medical diagnostics, criminal investigations are considered.