?
The Corpus of Contact-Influenced Russian of Northern Siberia and the Russian Far East
The paper presents a spoken corpus of contact-influenced Russian, which
consists of oral spontaneous Russian speech of bilingual speakers of indigenous
languages of Northern Siberia and the Russian Far East (Samoyedic,
Tungusic, Chukotko-Kamchatkan). The texts included in the corpus were transcribed
in ELAN in Standard Russian orthography and provided with a special
system of manual annotation of contact-induced features developed for the
corpus. The paper focuses mainly on this system of annotation, which is relevant
in a wider context of annotating any kind of speech with “deviations” from
the standard language variety (bilinguals’, learners’, dialectal speech etc.).
The annotation tags are grouped in several separate levels: contact-induced
morphological, syntactic, phonetic, lexical features etc. The exact meanings
for the annotation tags were proposed on empirical grounds. Transcribed and
annotated texts gain morphological annotation and search implementation
based on the Tsakorpus platform. The aim of the project is to provide a useful
resource for linguistic studies on language contact.