This study highlights the problem of intentional distortion of the writing skills of the writer. By intentional distortion the distortion caused by the intention of the text author, and the distortion caused by the circumstances of the external environment (for example, the will of third parties) are understood. The material for the study was person’s A confession. The document was subjected to linguistic analysis in the context of the trial in connection with the issue of exerting pressure on person A by law enforcement agencies when drafting the text of the confession. That is why the question whether there were any distortions of the author's speech skills in the analyzed document appeared (this distortion indirectly indicates the presence of pressure). It is important that in this case the authorship of the text by A is not denied.
While studying the document, language material was analyzed in the light of the language personality teaching. The study included two phases: linguistic analysis, and quantitative-linguistic analysis.
The result of the study was a mathematical model of the linguistic personality of the confession author (together with comparative samples of his written speech). This model revealed a discrepancy between the speech skills, explicated in the confession, and the stable speech skills of the face A.
The last two decades saw a dramatic increase in the number of papers published on the subject of stylometry, which is often narrowly understood as the task of identification of the author of a particular text fragment based on its stylistic properties. We present a new lightweight algorithm for stylometric identification of authors of Latin prose texts based on Burrows’s Delta, computed over relative frequencies of 244 manually selected genre and topic neutral words, and the Dirichlet distribution, whose parameters we estimate using an iterative maximum-likelihood algorithm. In order to demonstrate the effectiveness of the method, we present a case study of 3000-word fragments of texts by 36 classical and medieval authors and show that our method performs on par with Random Forest, a powerful general-purpose classification algorithm. We provide summary statistics of our algorithm’s performance together with confusion matrices demonstrating pairwise discriminability of texts by different authors. The advantages of our method are that it is very simple to implement, very quick to train and do inference with, and that it is very interpretable since it is a model-based algorithm: precision of the fitted Dirichlet distributions directly corresponds to the stylistic homogeneity of the texts by different authors. This makes it possible to use the algorithm as a general research tool in Latin stylistics.