A cross-genre morphological tagging and lemmatization of the Russian poetry: distinctive test sets and evaluation
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as a state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we suggest a method to compile the gold standard datasets for the Russian poetry. Secondly, we focus on the taggers’ performance in the identification of the part of speech tags and lemmas. We reveal what kind of POS classes, paradigm classes and syntactic patterns mostly affect the quality of processing.
In this article we report some new experiments in the area of words clustering for the Russian language. We introduce a new clustering method that distributes words into classes according to their syntactic relations. We used a large untagged corpus (about 7,2 bln of words) to collect a set of such relations. The corpus was processed using a set of finite state automata that extracts syntactically dependent combinations having explicit structure. These automata were used to process only unambiguous text fragments because of combination of these techniques increases the quality of sampled input data. The modification of group average agglomerative clustering was used to separate words between clusters. The sampled set of clusters was tested using one of the semantic dictionaries of the Russian language. The NMI score calculated in this article is equal to 0.457 and F1-score is 0.607.
Pleonastic Constructions In English Legal Texts
Quite a number of English legal texts, featuring largely contract law, provide linguistic evidence of both terminology, and/or commonly used vocabulary, with semantically identical or related meaning used at a time within the same text sequences. Such constructions appear challenging for taxonomic classification by linguists and lawyers alike. An analysis of examples allows for attributing such usage samples to pleonastic constructions typical for the legal language.
«Bankruptcy» Concept Within the Legal Linguistics Coordinates: Russian–English–French Approximations
The article addresses the notion of bankruptcy as perceived by speakers of current Russian, English and French languages both lawyers and participants in professional communication from other trades. Semantic structure of the term is identified based on its lexicographic and regulatory definitions.
The article contains the contrastive analysis of the ways homeland/motherland is presented in Russian and English poetry. Titles of the poems devoted to their native country become material for this analysis.
This paper deals with the Semantics/Pragmatics distinction in a contrastive ethnolinguistic aspect. I argue for the validity of this distinction based on cross-linguistic data. My claim is that the specificity of the so-called language key words [Wierzbicka 1990:15-17] - linguospecific items particularly representative of a given language speakersђ mentality - is due to pragmatic rather than semantic peculiarities. These pragmatic peculiarities distinguish the key words both from their synonyms within the same language and their counterparts in other languages. The languages under discussion are Russian and English, analyzed within a combined frame of Integral Language Description model [Apresjan 1995:8-238] and Wierzbickaђs ethnolinguistic approach.
The reports made at the 4,h Mandelstam Readings held on September 18-22,2011, comprise the best part of the book, but it also includes other articles on the life and works of Mandelstam. The first part called Mandelstam and Poland deals with interactions between the Russian poets life and Polish culture, the second part offers several studies of the poet’s biography, the third part - the Studies - is made up by articles on different aspects of Mandelstam’s textual studies and poetics. The part Reflexions includes materials on Mandelstam’s perception in the Russian cultural history. The book comprises a wide spectrum of voices and different approaches to Mandelstam, from academic ones to poetic ones. Among those who supplied their writing for this collection are Adam Pomorski, Iwona Smolka, Pyotr Mitzner, Anne Faivre-Dupegre, Sergey Vasilenko, Irena Verblovskaya, Aleksandr Zholkovsky, Marietta Chudakova, Leonid Vidgof, Vladimir Mikushevich, Leonid Katsis, Oleg Lekmanov, Natalya Gorbanevskaya, Uriy Freidin, Pavel Nerler, Lada Panova, Roman Timenchik, Boris Frezinsky, Irina Surat, Pavel Uspensky, Anna Yeskova, Natalya Petrova, Heinrich Kirschbaum etc.