POS tagger evaluation for the automated text analysis and identification of learner error
Working with learner corpora requires elaborate NLP techniques such as POS-annotation. In this article a team of computational linguists presents their experience of choosing a POS-tagger for precise and effortless annotation of .txt files with Python3. Russian Error-Annotated Learner English Corpus (REALEC) is the underlying corpora to which text features the POS-tagger has to respond. After identifying four most promising Part of Speech Taggers our team conducted several sets of test and applied various criteria for evaluation of the taggers precision, speed and compatibility with Python scripts that are already used for the research. The description of tests and statistics along with evaluation of POS taggers such as PatternTagger, NLTK, SpaCy and TreeTagger and the conclusion our team arrived at are presented in the following article.