?
Assessing novice writing against the Corpus of Academic Texts
CAT is a Corpus of Russian Academic Texts that consists of recently published scientific articles enriched with metadata, morphological and syntactic annotation (see Kopotev et al., 2019). The CAT is used as a reference corpus to automatically evaluate a novice student’s paper against the academic standard. We provide a web service that helps students seeking to improve their writing skills, by getting automatic feedback on the text. As a first step of analysis, the system provides a general analysis of the novice text; the second step involves a fine-grain analysis that analyses three broad areas, found challenging for learners of Russian; 1) lexical knowledge (e.g. unattested lexemes), 2) grammatical knowledge (e.g. overuse of the Genitive case), and 3) collocational knowledge (e.g. well-formed, but unattested collocations). Our project is still a work in progress. The current report is focused on general analysis, where along with the standard measures, we also apply those that focus on academic features of text.