The Lost Tram. The solution
The solution for the Lost Tram problem.
This paper reports on the first competition on automatic spelling correction for Russian language—SpellRuEval—held within the framework of “Dialogue Evaluation”. The competition aims to bring together groups of Russian academic researchers and IT-companies in order to gain and exchange the experience in automatic spelling correction, especially concentrating on social media texts. The data for the competition was taken from Russian segment of Live Journal.
7 teams took part in the competition, the best results were achieved by the model using edit distance and phonetic similarity for candidate search and n-gram language model for their reranking. We discuss in details the algorithms used by the teams, as well as the methodology of evaluation for automatic spelling correction.
This study discusses a number of methods that can be used jointly for error detection and correction, namely blacklists and pre-compiled dictionaries, a word2vec model, an N-gram language model and a tripartite error model. Our system consists of two standalone modules, an error detection confidence classifier, built with the help of supervised machine learning methods, and a corrector that processes words flagged as misspellings by the classifier. The error detection classifier uses word2vec filtered vector scores as one of the features. Apart from that, to achieve higher accuracy while having little training data, we use a hybrid error model that combines three approaches: the traditional channel model that uses single letter edits, the model introduced by Brill and Moore, and an extended version of the channel model that uses wider context edits. Combining these tools and methods we achieved rather promising results: our system effectively handles both known and unknown words, including difficult cases such as slang.
The study analyzes the genesis of the modern attitude towards spelling and spelling mistakes in Germany and in Russia in the nineteenth century, showing that both the spelling norms and the relevance of their violation are social constructions to do with major developments of the time such as industrialization, political reaction, proliferation of literacy and mass schooling, and introduction of exams and grading as means to check the upward social mobility via education.
The article describes my research project dealing with the history of the spelling error as a societal rather than linguistic phenomenon. Studying under the social constructionist angle the history of spelling and the way it was taught at German schools in the nineteenth and twentieth centuries, I encountered interesting reaction on the part of contemporary German audience to whom I presented my findings.