Квантитативные методы в диахронических корпусных исследованиях: конструкции с предикативами и дативным субъектом
The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the following aspects. First of all the object of the research is the choice between explication or omitting the dative subject in the construction. While usually the predicates are classified on the basis whether they can in principle be used with dative subject, I study the trends for explicit use of dative (or prepositional beneficiary arguments) among the “dative subject predicates”, and show that the frequency rates of real use of dative subjects can be very different with different predicates. Secondly I regard separately different morphological forms of the same dative subject lexeme (i.e. adjectives in full and short forms, comparative adjectives and predicatives) and show that they may also reveal different strategies with explicit dative subjects. Finally I compare data from the 18th and the 21st centuries and use hierarchical clustering to reveal some diachronic trends in the use of dative subjects. The research is based on quantitative study of the examples from the Russian National Corpus.
In this article we report some new experiments in the area of words clustering for the Russian language. We introduce a new clustering method that distributes words into classes according to their syntactic relations. We used a large untagged corpus (about 7,2 bln of words) to collect a set of such relations. The corpus was processed using a set of finite state automata that extracts syntactically dependent combinations having explicit structure. These automata were used to process only unambiguous text fragments because of combination of these techniques increases the quality of sampled input data. The modification of group average agglomerative clustering was used to separate words between clusters. The sampled set of clusters was tested using one of the semantic dictionaries of the Russian language. The NMI score calculated in this article is equal to 0.457 and F1-score is 0.607.
«Bankruptcy» Concept Within the Legal Linguistics Coordinates: Russian–English–French Approximations
The article addresses the notion of bankruptcy as perceived by speakers of current Russian, English and French languages both lawyers and participants in professional communication from other trades. Semantic structure of the term is identified based on its lexicographic and regulatory definitions.
Four electronic corpora created in 2011 within the framework of the “Corpus Linguistics: the Albanian, Kalmyk, Lezgian, and Ossetic Languages” Program of Fundamental Research of the RAS are presented. The interface and functionalities of these corpora are described, engineering problems to be solved in their creation are elucidated, and the promises of their development are discussed. A particular emphasis is made on the compilation of dictionaries and automatic grammatical markup of the corpora.
The project we present – Russian Learner Translator Corpus (RusLTC) is a multiple learner translator corpus which stores Russian students’ translations out of English and into it. The project is being developed by a cross-functional team of translator trainers and computational linguists in Russia. Translations are collected from several Russian universities; all translations are made as part of routine and exam assignments or as submissions for translation contests by students majoring in translation. As of March 2014 RusLTC contains the total of nearly 1.2 million word tokens, 258 source texts, and 1,795 translations. The paper gives a brief overview of the related research, describes the corpus structure and corpus-building technologies used; it also covers the query tool features and our error annotation solutions. In the final part we make a summary of the RusLTC-based research, its current practical applications and suggest research prospects and possibilities.
This paper deals with the Semantics/Pragmatics distinction in a contrastive ethnolinguistic aspect. I argue for the validity of this distinction based on cross-linguistic data. My claim is that the specificity of the so-called language key words [Wierzbicka 1990:15-17] - linguospecific items particularly representative of a given language speakersђ mentality - is due to pragmatic rather than semantic peculiarities. These pragmatic peculiarities distinguish the key words both from their synonyms within the same language and their counterparts in other languages. The languages under discussion are Russian and English, analyzed within a combined frame of Integral Language Description model [Apresjan 1995:8-238] and Wierzbickaђs ethnolinguistic approach.
The present article continues the investigation of the Soqotri verbal system undertaken by the Russian-Soqotri fieldwork team. The article focuses on the so-called “weak” and “geminated” roots in the basic stem. The investigation is based on the analysis of full paradigms (perfect, imperfect and jussive) of more than 170 “weak” and “geminated” Soqotri verbs.
I give the explicit formula for the (set-theoretical) system of Resultants of m+1 homogeneous polynomials in n+1 variables