Referential Choice: Factors and Modeling
Referential choice is the process of selecting an appropriate referential expression for a referent that the speaker/writer intends to mention at some point in discourse. Referential choice is governed by the referent's current status in the speaker's/writer's working memory. This status, in turn, is determined by a number of factors, rooted in discourse context and referent's properties. Activation in working memory is immediately responsible for the coarse choice between full and reduced referential devices, which is the high level distinction in the hierarchical organization of referential choice. Lower levels of granularity correspond to the choice between proper names and description, and still more refined options. Referential choice is a multi-factorial process. We have created a corpus of written texts in which many potentially relevant factors of referential choice are annotated. We also use another corpus in which the same texts are annotated for discourse structure, as it is known that rhetorical distance, measured on the basis of hierarchical discourse structure, is a powerful factor of referential choice. We have modeled referential choice in the corpus with the help of a variety of machine learning algorithms. The accuracy of prediction for the choice between full and reduced referential devices is close to 90%, and for the three-way choice between pronouns, descriptions, and proper names it is close to 80%. We experimented with the reduction of the set of factors and explored the phenomenon of non-categorical that is probabilistic, referential choice.