Proto-Indo-European-Uralic Comparison from the Probabilistic Point of View
In this paper we discuss the results of an automated compari-son between two 50-item groups of the most generally stable elements on the so-called Swadesh wordlist as reconstructed for Proto-Indo-European and Proto-Uralic. Two forms are counted as potentially related if their first two consonantal units, transcribed in simplified consonantal class notation (a rough variant of the Levenshtein distance method), match up with each other. Next to all previous attempts at such a task (Ringe 1998; Oswalt 1998; Kessler & Lehtonen 2006; Kessler 2007), our automated algorithm comes much closer to emu-lating the traditional procedure of cognate search as em-ployed in historical linguistics. “Swadesh slots” for protolan-guages are filled in strict accordance with such principles of reconstruction as topology (taking into consideration the structure of the genealogical tree), morphological transpar-ency, typology of semantic shifts, and areal distribution of particular items. Altogether we have counted 7 pairs where Proto-Indo-European and Proto-Uralic share the same bi-consonantal skeleton (the exact same pairs are regarded as cognates in traditional hypotheses of Indo-Uralic relation-ship). To verify the probability of arriving at such a result by chance we have applied the permutation test, which yielded a positive result: the probability of 7 matched pairs is equal to 1.9% or 0.5%, depending on the constituency of the conso-nantal classes, which is lower than the standard 5% threshold of statistical significance or even lower than the strong 1% level. Standard methodology suggests that we reject the null hypothesis (accidental resemblance) and offer a more plau-sible explanation for the observed similarities. Since the known typology of language contacts does not speak in favor of explaining the observed Indo-Uralic matches as old lexical borrowings, the optimal explanation is seen in the hypothesis of an Indo-Uralic genetic relationship, with the 7 matching pairs in question representing archaic retentions, left over from the original Indo-Uralic protolanguage.