In this article we report some new experiments in the area of words clustering for the Russian language. We introduce a new clustering method that distributes words into classes according to their syntactic relations. We used a large untagged corpus (about 7,2 bln of words) to collect a set of such relations. The corpus was processed using a set of finite state automata that extracts syntactically dependent combinations having explicit structure. These automata were used to process only unambiguous text fragments because of combination of these techniques increases the quality of sampled input data. The modification of group average agglomerative clustering was used to separate words between clusters. The sampled set of clusters was tested using one of the semantic dictionaries of the Russian language. The NMI score calculated in this article is equal to 0.457 and F1-score is 0.607.
These proceedings include papers on subjects from a wide number of areas including theoretical linguistics, translation, computational linguistics, natural language processing, and applied linguistics, focusing on a variety of languages, ranging from familiar Indo-European languages to Mandarin Chinese, Wolof, and Dene Sųɬiné. In order to make the papers available to the wider research community, these proceedings are being published electronically and distributed freely at http://www.meaningtext.net
Pleonastic Constructions In English Legal Texts
Quite a number of English legal texts, featuring largely contract law, provide linguistic evidence of both terminology, and/or commonly used vocabulary, with semantically identical or related meaning used at a time within the same text sequences. Such constructions appear challenging for taxonomic classification by linguists and lawyers alike. An analysis of examples allows for attributing such usage samples to pleonastic constructions typical for the legal language.
«Bankruptcy» Concept Within the Legal Linguistics Coordinates: Russian–English–French Approximations
The article addresses the notion of bankruptcy as perceived by speakers of current Russian, English and French languages both lawyers and participants in professional communication from other trades. Semantic structure of the term is identified based on its lexicographic and regulatory definitions.
This paper deals with the Semantics/Pragmatics distinction in a contrastive ethnolinguistic aspect. I argue for the validity of this distinction based on cross-linguistic data. My claim is that the specificity of the so-called language key words [Wierzbicka 1990:15-17] - linguospecific items particularly representative of a given language speakersђ mentality - is due to pragmatic rather than semantic peculiarities. These pragmatic peculiarities distinguish the key words both from their synonyms within the same language and their counterparts in other languages. The languages under discussion are Russian and English, analyzed within a combined frame of Integral Language Description model [Apresjan 1995:8-238] and Wierzbickaђs ethnolinguistic approach.
This paper presents an analysis of forms of address used in reference to an unknown recipient in everyday communication. In describing the operation of the particular treatment as the author relies on the opinion of renowned experts in the field of speech etiquette and culture of Russian language and on their own linguistic observations and data from a survey conducted in the fall of 2010 the capital’s population aged 20-50 years.