Automatic data collection in lexical typology

D. Ryzhova; A. A. Melnik; Paperno D.; Singh Y.

?

Automatic data collection in lexical typology

P. 619–636.

Ryzhova D., Melnik A. A., Ершов И. А., Пантелеева И. М., Paperno D., Singh Y., Соболев М. А.

The paper addresses an issue of an automatic data collection for lexical typological studies in the Frame approach paradigm. A research in this framework is based on the analysis of distributional properties of the lexemes in question. Hence, questionnaires for such studies consist of typical contexts where lexical items from a given semantic domain can potentially occur. We aim at filling these questionnaires automatically, and this task can be splitted into two different problems: questionnaire translation and its filling with the relevant data. We suggest three methods for the first task completion (translation via bilingual dictionaries vs. online cloud translators vs. parallel corpora), and two algorithms are focused on the second task (filling of a questionnaire based on monolingual corpora vs. on online translators). We test our algorithm on the data from four semantic domains of qualitative features (‘sharp’, ‘smooth’, ‘thick’, ‘thin’).

Language: English

Text on another site

Keywords: lexical typology parallel corpora synonyms corpus study

In book

Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2018”

[б.и.], 2018.

Automatic construction of lexical typological questionnaires

Paperno D., Ryzhova D., , in: Methodological Tools for Linguistic Description and TypologyIssue 16. University of Hawaii Press, 2019. Ch. 5 P. 45–61.

Questionnaires constitute a crucial tool in linguistic typology and language description. By nature, a Questionnaire is both an instrument and a result of typological work: its purpose is to help the study of a particular phenomenon cross-linguistically or in a particular language, but the creation of a Questionnaire is in its turn based on the ...

Added: August 30, 2019

Russian Learner Parallel Corpus as a Tool for Translation Studies

Kutuzov A. B., Kunilovskaya M. A., Oschepkov A. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 30 мая–3 июня 2012 г.). В 2 томахТ. 1: Основная программа конференции. Вып. 11. М.: Российский государственный гуманитарный университет, 2012. P. 362–369.

The paper presents a project aimed at the development of a Russian Learner Parallel Corpus, discusses the existing analogues, describes the current status and the tasks in which it could be used. The existing parallel corpora contain (comparatively) “correct” translations; whereas the aim of the present project is to create a sufficiently large corpus of ...

Added: February 13, 2013

Temperature terms in modern Eastern Armenian

Daniel M., Khurshudian V., , in: Linguistics of Temperature. Amsterdam: John Benjamins Publishing Company, 2015. P. 392–439.

This paper is an analysis of lexical categorisation of the temperature domain in modern Eastern Armenian. Compared to the vast research outline proposed in (Koptjevskaja-Tamm 2011), this paper has several important limitations. First, it is focused on non-derived, primary temperature terms (most of which happen to be adjectives or nouns, or both). Derived lexical items, ...

Added: October 17, 2013

Length of East Caucasian subject indexes: a quantative research

Moroz G., , in: Дурхъаси хазна. Сборник статей к 60-летию Р. О. Муталова. М.: Буки Веди, 2021. P. 258–282.

In this article I present a connection between frequency and length of person-number indexes via two independent researches: token frequency obtained from the Universal Dependencies’ treebanks and type frequency gathered within a typological study. After introducing the results of those two studies, I will present East Caucasian data. I show that the unusual history of ...

Added: May 23, 2021

Improving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance

Kutuzov A. B., , in: Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics, 2013. P. 63–68.

The present paper introduces approach to improve English-Russian sentence alignment, based on POS-tagging of automatically aligned (by HunAlign) source and target texts. The initial hypothesis is tested on a corpus of bitexts. Sequences of POS tags for each sentence (exactly, nouns, adjectives, verbs and pronouns) are processed as “words” and Damerau-Levenshtein distance between them is ...

Added: September 5, 2013

Базовые обозначения боли в бесермянском удмуртском

Усачева М. С., Leontieva A., Вопросы языкознания 2021 № 6 С. 69–98

This paper is devoted to semantic structure and syntactic properties of predicates of pain in Beserman Udmurt. Beserman is a variety of Udmurt spoken in northwestern Udmurtia, which has undergone contact influence of Russian dialects and of Turkic languages. We analyze meanings and compatibility of units which denote pain, describe grammatical encoding of different participants ...

Added: October 27, 2021

Глаголы падения в казымском диалекте хантыйского языка

Ванеян С. С., Toldova S., Железнова В. А., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2020 Т. XVI № 1 С. 435–461

The work is devoted to the verbs of falling in Kazym dialect of Khanty language (Ob-Ugric < Finno-Ugric). This dialect is spoken in the district near the Kazym River, Khanty-Mansi district. The work suggests the detailed distribution of the core verbs pertaining to the semantic field of falling. The system of verbs of falling in ...

Added: November 17, 2021

Глаголы падения в чукотском языке

Kozlov A., Kasyanova P., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2019 Т. 15

Статья посвящена семантике глаголов падения в амгуэмском говоре чукотского языка. В работе рассматриваются две основные лексемы, описывающие ситуацию падения, и анализируется их дистрибуция. Кроме того, обсуждается несколько глаголов, использующихся в более конкретных контекстах, а также ряд метафорических переносов, целевым фреймом которых оказывается один из фреймов падения. ...

Added: October 31, 2019

Time and speed: Where do speed adjectives come from?

Rakhilina E. V., Плунгян В. А., Russian linguistics 2013 Vol. 37 No. 3 P. 347–359

The article examines the relationship between time and space in language on the basis of adjectives denoting high or low speed in Russian and other (mostly Slavic) languages. In physics the notion of speed is defined in terms of time and space (distance per time unit). It is argued, however, that speed in natural language ...

Added: November 18, 2013

Глаголы звуков животных в идише

Luchina E., Baranova S., В кн.: «Глаголы звуков животных: типология метафор». М.: Языки славянских культур, 2015.

Работа проведена в русле лексической типологии и ориентируется на её комплексный подход с использованием словарей, корпусов и анкетирования информантов. Первым этапом исследования, как обычно, являлся сбор материала по лексикографическим источникам. Дополнительным промежуточным результатом является ранжирование словарей по их пригодности для лексико-типологического исследования. Метафорические модели и совмещения, найденные в материале языка идиш, несмотря на их небольшое ...

Added: December 12, 2014

Параллельные белорусско-русский и русско-белорусский корпусы: совместный проект Национального корпуса русского языка

Sichinava D., Arkhangelskiy T., В кн.: Корпусы национальных языков: модели и технологии. Труды Казанской школы по компьютерной и когнитивной лингвитике TEL-2012. Каз.: Издательство «Фэн» Академии наук Республики Татарстан, 2012. С. 54–60.

Added: April 23, 2013

Фрагмент лексической системы казымского диалекта хантыйского языка: глаголы pitti ‘упасть, попасть’ и χɔjti ‘задеть, попасть’ и их аргументная структура

Ryzhova D., Урало-алтайские исследования 2022 № 2 (45) С. 123–140

The paper describes semantics of the Kazym Khanty verbs pitti ‘to fall; to get into somewhere’ and χɔjti ‘to touch; to hit the target’ under the framework of the frame-based approach to lexical typology, according to which a word acquires different meanings in different context types. The sets of physical meanings of the verbs in ...

Added: October 30, 2021

The Poetic Corpus of Russian: Where the Poems are Written

Sichinava D., Orekhov B., , in: Proceedings of the Second Workshop on Corpus-Based Research in the Humanities CRH-2, 25-26 January 2018 Vienna, Austria. Wien: Gerastree Proceedings, 2018. P. 201–205.

The paper discusses the marking of the composition location in the Poetic Corpus of Russian that enables customizing subcorpora by these locations and subsequent search by this parameter. The place names indicated by the authors are extracted, tagged and “normalized”, that is, all the different versions of names and minor locations are boiled down to ...

Added: August 30, 2018

Doing lexical typology with frames and semantic maps

Rakhilina E. V., Reznikova T., / NRU HSE. Series WP BRP "Linguistics". 2014. No. 18.

In this paper we present an approach to lexical typology which will be referred to as the “frame method”. It was developed and tested in the Moscow Lexico-Typological Group and is currently used in all its projects, such as Majsak, Rakhilina (eds.) 2007, Britsyn et al. (eds.) 2009, Kruglyakova 2010, Reznikova et al. 2012. Our ...

Added: December 15, 2014

Глаголы движения в воде: Лексическая типология

М.: Индрик, 2007.

В сборнике представлены результаты системного лексико-типологического исследования, выполненного на широком языковом материале: на основе единой анкеты описана семантическая зона движения и нахождения в воде (зона плавания, или aquamotion) для более чем сорока языков, представляющих самые разные языковые семьи и ареалы — славянские, балтийские, романские, германские, уральские, тюркские, семитские, кавказские, африканские и др. Статьи написаны специалистами по ...

Added: May 1, 2014

К типологии прилагательных размера: данные тегинского говора хантыйского языка

Kozlov A., Привизенцева М. Ю., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2014 Т. 10 № 1 С. 748–761

The article focuses on dimension adjectives in Tegi Khanty ...

Added: October 3, 2017

Компьютерные перспективы лексико-типологических исследований

Orekhov B., Reznikova T., Вестник Воронежского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2015 № 3 С. 17–23

В статье обсуждается методика, позволяющая частично автоматизировать анализ материала для лексико-типологического исследования. Одной из основных задач при сопоставительном изучении семантического поля является обнаружение таких значений, для которых наблюдаются разные стратегии кодирования в различных языках (ср. значение ‘характеризующийся повышенной влажностью’: в русском его выражение зависит от описываемых температурных условий – влажный vs. сырой, в немецком в ...

Added: December 15, 2015

К описанию семантического поля падения в рутульском языке

Nasledskova P., Netkachev I., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2020 Т. 16, ч.1 С. 801–818

This paper aims to describe the semantic filed of ‘falling’ in Rutul (< Lezgic < East Caucasian). Our research is based on the data from Kina Rutul variety, which is spoken in the village of Kina (Rutulsky district, Dagestan, Russia). All the data have been elicited. For our analysis, we use a frame-based methodology for ...

Added: August 23, 2020

К лексической типологии прилагательных размера: данные тегинского говора хантыйского языка

Kozlov A., Привизенцева М. Ю., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2014 Т. X № 1 С. 748–761

The article focuses on the system of dimensional terms in Tegi Khanty ...

Added: October 4, 2017

Grammatical Metaphor Constructions: Acquisition Analysis

Galeyeva A. I., Gumovskaya G., Актуальные проблемы филологии и педагогической лингвистики (Россия) Тематический выпуск "Лингвистика XXI века: направления, методы, перспективы развития" 2023 No. 3 P. 183–191

This article studies how L2 Russian speaking learners of English for Academic Purposes acquire the skill of grammatical metaphor phrasal inclusion. The study applies corpus methodology (Lancsbox concordance tools) and statistical analysis tools in order to analyse the interdendency between the level of linguistic expertise of learners and their ability to apply certain patterns with ...

Added: October 18, 2023

Лексическая типология в ошибках изучающих русский как иностранный: анализ глагола падать в текстах инофонов

Vyrenkova A. S., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2020 Т. XVI Ч.1 С. 368–385

This paper investigates the use of the Russian verb padat’ ‘to fall’ and its quasi-synonyms. Padat’ is dominant in the system of Russian predicates of falling — and therefore should be suitable for describing any type of uncontrolled downward motion. However, in a number of contexts a diff erent means of expression is required. These ...

Added: August 2, 2021

A New Approach to OLD Studies

Anastasia Vyrenkova, Ekaterina Rakhilina, Orekhov B., , in: The Typology of Physical Qualities. Amsterdam: John Benjamins Publishing Company, 2022. Ch. 7 P. 189–214.

Added: October 31, 2018

Семантика русского диминутива в межъязыковых соответствиях и взаимодействиях: корпусное и экспериментальное исследование

Резанова З. И., Artemenko E., Васильева А. В. et al., Томск: Издательство Томского университета, 2019.

В монографии представлены результаты интерпретации семантики диминутивов – производных имен существительных русского языка, включающих компоненты эмоционально-оценочной семантики. Развитое диминутивное словообразование – яркая отличительная особенность деривационной подсистемы русского языка, объединяющая ее с другими славянскими языками и в значительной степени отличающая ее от деривационных систем других индоевропейских и неиндоевропейских, в том числе тюркских, языков. Авторы исследуют особенности ...

Added: October 29, 2021

Typology of Adjectives Benchmark for Compositional Distributional Models

Ryzhova D., Kyuseva M., Paperno D., , in: Proceedings of the Language Resources and Evaluation Conference. P.: European Language Resources Association (ELRA), 2016. P. 1253–1257.

In this paper we present a novel application of compositional distributional semantic models (CDSMs): prediction of lexical typology. The paper introduces the notion of typological closeness, which is a novel rigorous formalization of semantic similarity based on comparison of multilingual data. Starting from the Moscow Database of Qualitative Features for adjective typology, we create four ...

Added: October 18, 2016