SyntaxNet Errors from the Linguistic Point of View
The paper deals with Google’s universal parser SyntaxNet. The system was used to analyze the Universal Dependencies linguistic corpora. We conducted an error analysis of the output of the parser to reveal to what extent the error types are connected with or preconditioned by the language types. In particular, we carried out several experiments, clustering the languages based on the frequency of different errors made by SyntaxNet, and studied the similarity of the resulting clustering with the traditional typology of languages. Three types of errors were separately considered: part-of-speech tagging, dependency labeling, and attachment errors. We show that there is indeed a correlation between error frequencies and language types, which might indicate that to further improve the performance of a universal parser, one needs to take into account language-specific morphological and syntactic structures.