Morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments
During reading or listening, people can generate predictions about the lexical and morphosyntactic properties of upcoming input based on available context. Psycholinguistic experiments that study predictability or control for it conventionally rely on a human-based approach and estimate predictability via the cloze task. Our study investigated an alternative corpus-based approach for estimating predictability via language predictability models. We obtained cloze and corpus-based probabilities for all words in 144 Russian sentences, correlated the two measures, and found a strong correlation between them. Importantly, we estimated how much variance in eye movements registered while reading the same sentences was explained by each of the two probabilities and whether the two probabilities explain the same variance. Along with lexical predictability (the activation of a particular word form), we analyzed morphosyntactic predictability (the activation of morphological features of words) and its effect on reading times over and above lexical predictability. We found that for predicting reading times, cloze and corpus-based measures of both lexical and morphosyntactic predictability explained the same amount of variance. However, cloze and corpus-based lexical probabilities both independently contributed to a better model fit, whereas for morphosyntactic probabilities, the contributions of cloze and corpus-based measures were interchangeable. Therefore, morphosyntactic but not lexical corpus-based probabilities can substitute for cloze probabilities in reading experiments. Our results also indicate that in languages with rich inflectional morphology, such as Russian, when people engage in prediction, they are much more successful in predicting isolated morphosyntactic features than predicting the particular lexeme and its full morphosyntactic markup.