• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Working paper

Applying statistical tagging to Russian poetry

Starchenko A., Kazakevich L., Lyashevskaya O.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as one state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we discuss the method to compile the gold standard datasets for the Russian poetry. Secondly, we evaluate the taggers’ performance in the identification of the part of speech tags and lemmas. Finally, we analyze different types of errors in the taggers’ output. We analyse the confusion matrix of the parts of speech and mismatches in lemma annotation.