Мультимедийный корпус языка идиш
The problem of morphological ambiguity is widely addressed in the modern NLP. Mostly ambiguity is resolved with the use of large manually-annotated corpora and machine learning. However, such methods are not always available, as good training data is not accessible for all languages. In this paper we present a method of disambiguation without gold standard corpora using several statistical models, namely, Brill algorithm (Brill 1995) and unambiguous n-grams from the automatically annotated corpus. All the methods were tested on the Corpus of Modern Greek and on the Corpus of Modern Yiddish. As a result, more than a half of words with ambiguous analyses were disambiguated in both corpora, demonstrating high precision (>80%). Our method of morphological disambiguation demonstrates that it is possible to eliminate some of the ambiguous analyses in the corpus without specific linguistic resources, only with the use of raw data, where all possible morphological analyses for every word are indicated.
The present study describes the recordings made in a language documentation expedition to Bessarabia in 2012. All the native speakers are bi- or trilingual, though Rumanian doesn’t play any role, while Russian has had strong influence on their speech. Based on dictionaries and grammars, we can relatively easily distinguish this effects from the results of earlier interaction with Slavic, which is now part of standard language or Eastern Yiddish. In this paper I will focus on verbal structures alternations that involve both vocabulary and syntax.
The vocabularies of endangered languages surrounded by more prestigious languages are gradually shrinking in size due to the influx of borrowed items. It is easy to observe that in such languages, starting from some frequency rank, the lower the frequency of a vocabulary item, the higher the probability of that item being a borrowed one. On the basis of the data from the Beserman dialect of Udmurt, the article provides a model according to which the portion of borrowed items among the items with frequency ranks less than r increases logarithmically in r, starting from some rank r0, while for more frequent items, it can behave differently. Apart from theoretical interest, the model can be used to roughly predict the total number of native items in the vocabulary based on a limited corpus of texts.
This study is dedicated to the problem of automatic transliteration of different Yiddish orthographies. Almost every publishing house has its own specific orthographical features and each orthography can be inconsistent. The team of the Yiddish corpus needs a tool that would standardize the variety of the writing systems. There are several types of converters but they can not meet all our needs. The converter that we created works in two steps: firstly, using the complicated rule-based system, it converts any given Yiddish text into standard orthography, secondly, it converts a text in standard Yiddish into one in Latin letters. The units engaged into our rule-based system are mostly morphemes although we used also some other letter combination that ought to be transliterated in a complicated way. Our solutions led to the accuracy of transliteration 94% of raw text and 98% of the text written in more or less standard orthography. We think its efficiency can be improved by adding a list of words of semitic origin and by methods of machine learning.
The paper is focused on the study of reaction of italian literature critics on the publication of the Boris Pasternak's novel "Doctor Jivago". The analysys of the book ""Doctor Jivago", Pasternak, 1958, Italy" (published in Russian language in "Reka vremen", 2012, in Moscow) is given. The papers of italian writers, critics and historians of literature, who reacted immediately upon the publication of the novel (A. Moravia, I. Calvino, F.Fortini, C. Cassola, C. Salinari ecc.) are studied and analised.
In the article the patterns of the realization of emotional utterances in dialogic and monologic speech are described. The author pays special attention to the characteristic features of the speech of a speaker feeling psychic tension and to the compositional-pragmatic peculiarities of dialogic and monologic text.