?
Linguistic Modeling as a Basis for Creating Authorship Attribution Software
This paper discusses approbation of an integrative attribution method for texts in the Russian language. The methodology goes after (Koppel, Schler 2003): computer program tries to imitate human expert work. So, it is based on interpretative language study with its objectification through mathematical statistics. The choice of parameters describing the author’s individual style is rooted to considering text to be a product of an authentic language personality. Language personality is described using psycholinguistic (Yu.N. Karaulov), sociolinguistic (M.Coulthard, R. W.Shuy) methods and the methodology of forensic linguistics (S.M. Vul, D.Wright). On the basis of the principles above, the software for attribution is created: http://khorom-attribution.ru/#/. As output the resource displays mathematical models of persons’ individual styles and the metrics for null hypothesis evaluation: Pearson correlation coefficient, linear regression and Student’s t-test. The functionality of the resource is aimed to solve an identification problem of text attribution for «closed class» (Juola 2008) with pair-wise comparison, but the resource can also be used in the personality diagnostics in forensic, philological and cultural researchers