?
Yiddish Orthographies Variety and Problems of Automatic Transliteration
This study is dedicated to the problem of automatic transliteration of different Yiddish orthographies. Almost every publishing house has its own specific orthographical features and each orthography can be inconsistent. The team of the Yiddish corpus needs a tool that would standardize the variety of the writing systems. There are several types of converters but they can not meet all our needs. The converter that we created works in two steps: firstly, using the complicated rule-based system, it converts any given Yiddish text into standard orthography, secondly, it converts a text in standard Yiddish into one in Latin letters. The units engaged into our rule-based system are mostly morphemes although we used also some other letter combination that ought to be transliterated in a complicated way. Our solutions led to the accuracy of transliteration 94% of raw text and 98% of the text written in more or less standard orthography. We think its efficiency can be improved by adding a list of words of semitic origin and by methods of machine learning.