Модель семантической разметки художественного текста для цифровых исследований в филологии
We suggest a TEI/XML-based model for text markup that empowers digital exploration of a fictional text. The markup is centered on fictional characters and their interactions. It is mainly obtained through automated procedures, such as named entity recognition, semantic role labeling and simple scripts for the extraction of direct speech. The resulting XML can be used to explore the character system with help of network analysis, stylometric tools and statistical analysis of semantic role distribution. Using Tolstoy’s War and peace as our showcase, we demonstrate the use of this markup to formalize character system and character hierarchy, detect important communities within it, define specific functions of certain characters, and ultimately reveal some of the author’s hidden creative device.