Сетевой подход к описанию башкирской морфологии
This study introduces a complex networks-based approach to quantifying agglutination. This approach is one of the most powerful ways of model description but it has been rarely used for linguistic needs and there are very few papers where it is applied to morphology.
The Bashkir language belongs to the Turkic languages which are considered to be agglutinative. Although the notion of agglutination was introduced in the 19th century, there is no generally accepted definition of an agglutinative language. Different features were supposed to be necessarily present in an agglutinative language, however, there seems to be no correlation between them. In this study we discuss the data provided by our network and relevant for the notion of agglutination and transcategoriality.
We conducted our study on Bashkir newspaper texts containing 5.8 mln tokens overall. They were annotated with the program “Bashmorph”. We built a network where nodes are affixes while edges represent cooccurrence of an affix pair. The network was built as weighted (based on the frequency of cooccurrences) and undirected. The network consists of 294 nodes and 3446 edges.
It turns out that several standard coefficients characterizing such a network help to quantify and describe certain characteristics of a language. In our case, most parameters correspond to agglutination. Namely, we discuss the meaning of assortativity coefficient, cliques number, maximal k-core, cluster coefficient and network density as well as some other data.