Computational Linguistics and Intellectual Technologies
The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies.
The paper discusses the standardization efforts to create a morphological standard for the Middle Russian corpus, which is part of the historical collection of the Russian National Corpus (RNC). To meet the needs of different categories of corpus researchers as well as NLP developers, we consider two styles of the morphological annotation (RNC schema and Universal Dependencies schema). A number of specifications of the feature list proposed to facilitate data reusability, linking and conversion.
The paper reports a method to create a speaker’s prosodic fingerprint based on the global characteristics of the pitch movement. Prosodic fingerprint is the distribution of f0 in the low, middle, and high ranges and the distribution of pitch movements from one range into other [Šimko et al. 2017]. This fully automated method can be used to classify the records and to provide the reference level for more sophisticated analysis of the pitch movement and intonation strategies. We evaluate the method by applying it to the spontaneous Russian spoken data recorded in different regions. We model the correlation between the fingerprint and sociolinguistic features such as age, gender, and region. The results of this analysis allow to formulate several sociolinguistic hypotheses that can further be tested with a more detailed analytic technique.