Some challenges of the West Circassian polysynthetic corpus

Arkhangelskiy T.; Lander Yu.

doi:10.2139/ssrn.2709027

Publications

?

Some challenges of the West Circassian polysynthetic corpus

НИУ ВШЭ , 2015. No. 37/LNG/2015.

Arkhangelskiy T., Lander Yu.

Although there exist comprehensive morphologically annotated corpora for many morphologically rich languages, there have been no such corpora for any polysynthetic language so far. Polysynthetic languages raise a variety of theoretical and practical challenges for corpus linguistics. Some of these challenges have been partly addressed when developing corpora for e. g. Turkic or Uralic languages, while others are unique for this kind of languages. Our paper identifies the most prominent challenges that we are facing in the course of development of West Circassian (Adyghe) corpus, and offer possible solutions. These include the tokenization problem, which involves delimiting morphology from syntax, the problem with lemmatization and part-of-speech tagging, and a number of glossing and search problems.

Research target: Philology and Linguistics

Priority areas: humanitarian

Language: English

Full text

DOI

Keywords: corpus linguistics Adyghe polysynthesis West Circassian

Developing a polysynthetic language corpus: problems and solutions

Arkhangelskiy T.A., Lander Yu.A., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 (22) P. 40-49

Added: June 5, 2016

West Circassian Imperative-Optative System: A Study in a Prototype-Based Organisation of a Grammatical Domain

Lander Y., Bagirokova I., Syntaxe et Sémantique 2022 Vol. 22 No. 1 P. 57-81

West Circassian has no less than two imperatives and two optatives. Their distribution depends on various parameters such as the speaker’s control over the situation, the person of the topic and the type of a predicate. The whole system arguably can be described with respect to the universal Imperative Prototype, which reflects grammaticalization of a ...

Added: October 26, 2020

Nominal complex in West Circassian: between morphology and syntax

Lander Yu., Studies in Language 2017 Vol. 41 No. 1 P. 76-98

The paper presents a description and an analysis of the nominal complex, a peculiar construction which includes a noun and its modifiers, in West Circassian, a polysynthetic language of the Northwest Caucasian family. The nominal complex shows properties of a single word and tends to follow the template proposed for the word in West Circassian. ...

Added: August 8, 2016

Two-faced subordination marker in West Circassian necessity constructions

Lander Yu., Bagirokova I., / НИУ ВШЭ. Series WP BRP "Linguistics". 2015. No. 38/LNG/2015.

This paper describes the behavior of a subordination marker ‑n in the modal necessity constructions in West Circassian, a polysynthetic language belonging to the Northwest Caucasian family. We show that ‑n functions as a simple suffix in the non-epistemic construction and as a phrasal affix in the epistemic construction. Hence, this morpheme violates the principle ...

Added: December 15, 2015

Цилитивы (‘легко’ и ‘трудно’) в адыгейском языке: семантика, аргументная структура и частеречные характеристики

Lander Y., Bagirokova I., Рема 2021 № 1 С. 56-75

West Circassian displays two types of cilitive (facilitive ‘easy’ and difficilitive ‘difficult’) forms, namely noun cilitives, which describe individuals, and secondary cilitives, which describe the state of affairs. Secondary cilitives seemingly originate from noun cilitives, hence the same cilitive suffixes mark forms that are remarkably different from each other in their morphosyntax. While noun cilitives ...

Added: October 26, 2020

Адыгский аналитический аддитив: порядок слов и диахрония

Bagirokova I., Lander Y., Томский журнал лингвистических и антропологических исследований 2016 № 2 (12) С. 9-19

The paper deals with functioning of the analytical additive marker əčʼjə̣ / jəčʼjə̣ in the Temirgoy dialect of West Circassian (also known as Adyghe) and the Kuban dialect of Kabardian and analyses some morphosyntactic parameters, which serve to differentiate its various functions. According to the hypothesis we propose, the marker əčʼjə̣ / jəčʼjə̣ , which ...

Added: July 20, 2016

Deriving affix ordering in polysynthesis: Evidence from Adyghe

Korotkova N., Lander Yury, Morphology 2010 Vol. 20 No. 2 P. 299-319

This article deals with the order of verbal suffixes in Adyghe, a polysynthetic language of the Caucasus. Traditionally the structure of the Adyghe word form and the order of its affixes were described in terms of template morphology. However, we present new data demanding another, substantially different approach. We demonstrate that for the most part ...

Added: February 6, 2013

Аспекты полисинтетизма: Очерки по грамматике адыгейского языка

М. : РГГУ, 2009

Сборник включает статьи, посвященные анализу структуры полисинтетического адыгейского языка с типологической точки зрения. ...

Added: February 7, 2013

Corpus, multiple analyses and polysynthesis

Lander Yu., , in : Adıge filolojisi: Güncel Konular. : Düzce : Düzce University, 2016.

The paper discusses several problems which have been observed during the development of the corpus of West Circassian and proposes that their solutions should involve the possibility of multiple analyses. It is argued that this is related to certain properties of the constructions under discussion which are reflected in variation observed among the speakers of ...

Added: August 16, 2017

Adıge filolojisi: Güncel Konular

Düzce : Düzce University, 2016

The volume includes papers presented at the international symposium "Adyghe Philology". ...

Added: August 16, 2017

West Caucasian relative pronouns as resumptives

Lander Yu., Daniel M., Linguistics 2019 Vol. 57 No. 6 P. 1239-1270

In polysynthetic West Caucasian languages, the morphological verbal complex amounts to a clause, with all kinds of participants cross-referenced by affixes. Relativization is performed by introducing a relative affix in the cross-reference slot which corresponds to the relativized participant. However, these languages display several cross-linguistically rare features of relativization. Firstly, while under the view of ...

Added: June 28, 2018

Asymmetric word class systems and noun primacy: West Circassian and beyond

Lander Y., Bagirokova I., Journal of Linguistics 2021

In this paper we argue for the existence of an asymmetric parts-of-speech system where nouns constitute a separate word class but do not form any non-privative contrast with other content parts of speech. As a result, in a system of this kind there is no need to distinguish verbs even though there are good reasons ...

Added: October 26, 2020

Producing polysynthetic verb forms in West Circassian (Adyghe): an experimental study

Lander Yu., Arkhangelskiy T., / НИУ ВШЭ. Series WP BRP "Linguistics". 2015. No. 23/LNG/2015.

This paper describes a pilot experiment which was conducted by the authors with speakers of the polysynthetic West Circassian (Adyghe) language and aimed at investigating their ability to use complex verb forms that cross-reference several arguments introduced by applicative morphology. The results of the experiment support the view that complex polysynthetic words can be constructed ...

Added: April 10, 2015

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М. : Языки славянской культуры, 2016

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015

Non-quantificational distributive quantifiers in Besleney Kabardian

Arkadiev P., Lander Yu., Snippets 2013 No. 27 P. 5-7

The squib discusses certain unexpected properties of nominals containing distributive universal quantifiers in Besleney Kabardian such as their capacity to appear as clausal predicates and their similarities to plural nominals. ...

Added: October 15, 2013

Труды международной конференции "Корпусная лингвистика - 2019"

СПб. : Издательство Санкт-Петербургского университета, 2019

Сборние содержит материалы докладов, представленных на Международной научной конференции "Корпусная лингвистика-2019" 24-28 июня 2019 г. в Санкт-Петербурге. ...

Added: July 8, 2019

Morphological causatives in Abaza

Koshevoy A., / НИУ ВШЭ. Series WP BRP "Linguistics". 2018. No. 75/LNG/2018.

This paper deals with the productive morphological causative r(ə)- in Abaza (Northwest Caucasian), a highly polysynthetic ergative language. We discuss the causativization process in Abaza as well as the semantic properties of the construction and elaborate an analysis of the event structure of the Abaza morphological causatives based on the scope of adverbials. ...

Added: December 16, 2018

Spatial Meanings and Russian Prosody: a Corpus Study

Khudyakova M., / НИУ ВШЭ. Series WP BRP "Linguistics". 2014.

The objective of this paper is to see if we can find prosodic features that can express spatial meanings on corpus material. The main two questions that we try to answer are: 1. What prosodic instruments express spatial meanings? 2. What characteristics of space are coded by prosody in Russian language? The source of the ...

Added: October 22, 2014

Интенсификатор "до ужаса" в русском языке на пути грамматикализации

Герасимов Д. В., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2016 Т. XII № 1 С. 336-363

The paper presents a corpus-driven study of the Russian PP-based degree modifier do uzhasa (lit. ‘to horror’), suggesting a two-stage grammaticalization path. The first stage (presumably, XVIII–XIX c.) involves subjectification, while during the second stage, subjective readings give rise to intensifier readings through conceptual metonymy. Both stages see a host class expansion. This process is ...

Added: November 27, 2017

Международная конференция «Slavicorp»

Orekhov B., Вопросы языкознания 2011 № 3 С. 153-155

The article deals with the conference «Slavicorp» in Warsaw in November 2010. ...

Added: September 28, 2013

Phasal polarity in Abaza

Klyagina E., Panova A., / НИУ ВШЭ. Series WP BRP "Linguistics". 2019. No. 89/LNG/2019.

Phasal polarity (PhP) is a cross-linguistic category which includes such values as ᴀʟʀᴇᴀᴅʏ, ɴᴏᴛ ʏᴇᴛ, sᴛɪʟʟ and ɴᴏ ʟᴏɴɢᴇʀ. This paper discusses morphologically bound markers of phasal polarity in Abaza, a polysynthetic Northwest Caucasian language. We show that the Abaza PhP affixes ‑χ’a ‘already’, -s (+ negation) ‘not yet’, -rḳʷa ‘still’ and -χ (+ negation) ...

Added: December 14, 2019

Прогностическая валидность глагольных форм длительного аспекта в корпусной лингвистике английского языка

Popkova E., Социосфера 2010 № 4 С. 74-81

The article discusses the most recent trends in the development of the English progressive. A corpus-based approach to linguistic research is seen as an effective means of determining reliability of the data retrieved and helps track the major diachronic dynamic in the increasing frequency of the progressive aspect that has taken place since the beginning ...

Added: November 6, 2012

Adverbial phrases in Hasidic Yiddish

Arkhangelskiy T., Panova T., International Journal of the Sociology of Language 2014

The purpose of our study is to investigate the lexicalization of so-called adverbial phrases, such as fun a mol, in modern Hasidic Yiddish in comparison with written literary Yiddish of the 20th century. The phenomenon in question is a historical process in which several lexemes forming a frequent collocation (including nouns, adjectives, adverbs, prepositions and ...

Added: December 11, 2014

Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts

Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69-89

The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...

Added: June 24, 2021