?
Baselines and Symbol N-Grams: Simple Part-Of-Speech Tagging of Russian?
We propose using NB-SVM over bag of character n-grams input representation for determining part-of-speech tags and grammatical categories like gender, number, etc. for words in Russian texts. Several methods are compared including CRF (Conditional Random Fields), SVM (Support Vector Machines) and NB-SVM (Naive Bayes SVM) and superiority of NB-SVM over other classifiers is shown. The proposed model is the 5th best among 12 other models in the first shared task of the MorphoRuEval-17 challenge. We also experimented with category grouping when a single classifier is used to determine several grammatical categories and showed that it improves the model per- formance even further.