Positional skipgrams for Bambara: a resource for corpus-based studies

?

Positional skipgrams for Bambara: a resource for corpus-based studies

Mandenkan. 2019. No. 62. P. 165–183.

Maslinsky K. A.

This article presents a new online dataset of linguistically rich n‑gram frequency data for Bambara based on the disambiguated part of the Bambara Reference Corpus. The n‑grams in the dataset are positional skipgrams that capture information about co-occurrence of lexical items with grammatical categories at various relative positions. These n‑grams were constructed with the aim to leverage those types of information that are available in the morphologically annotated corpus of Bambara given the limited amount of textual data. The methodology and data used for constructing n‑grams for Bambara are discussed, followed by brief illustrations of how the positional skipgrams data may be employed in corpus-based linguistic research.

Research target: Philology and Linguistics

Оценка социально-экономического эффекта публикации открытых данных: на примере общественного транспорта города Москвы

Artamonov R., Датиев С. Б., Zhulin A. B. et al., М.: Издательский дом НИУ ВШЭ, 2015.

В данной работе проведено исследование влияния раскрытия машиночитаемых данных на развитие общественного транспорта г. Москвы. Выбор сферы общественного транспорта среди всех сфер жизни города обусловлен тем, что открытые данные этого направления уже достаточно давно публикуются за рубежом, появились приложения на их основе и появилась возможность судить о потенциальном эффекте от этих данных в России. Задачи ...

Added: July 15, 2015

Управление открытыми данными в России

Churakov V., Гришина Д. А., Российский юридический журнал 2021 № 6(141) С. 164–175

Open data publishing activity in Russia dates back to 2012. Almost 10 years later, the quality and quantity of such data has increased significantly. To facilitate the work with open data, both private and public information systems have been created. The main state system at the moment is the National Data Management System (NDMS). The ...

Added: October 24, 2022

Практика использования открытых данных в курсе "Программирование" образовательной программы бакалавриата "Программная инженерия"

Maksimenkova O. V., Podbelskiy V. V., Образование и наука 2016 Т. 139 № 10 С. 107–121

The aim of the publication is to show the possibilities of use of open data in teaching courses of programming. Methods. The results of adoption of the technique presented in the publication to the process of training in programming at the first year of the course «Program Engineering» are received by a comparative research and analysed ...

Added: January 6, 2017

“All these …”: Negative Opinion About People and “Pejorative Plural” in Russian

Blinova O. V., Lecture Notes in Computer Science 2019 Vol. 11551 P. 51–60

The paper discusses plural forms of Russian nouns (in particular, of the surnames) like vsjakie tam Ivanovy (‘various Ivanovs’, ‘all sorts of Ivanovs’), expressing negative opinion about the referents. The co-occurrence patterns of such Pl.Pej forms by the web-corpus data is revealed. Pl.Pej forms foremost fit together with universal quantifiers including ‘all’, ‘all of these’ ...

Added: November 1, 2020

Соматизм «руки» в русском и английском языковом сознании

Bogolepova S. V., Вопросы психолингвистики 2012 № 16 С. 192–197

This article represents an attempt to analyze the somatism (nomination of the bodypart) “hands” from the psycholinguistic point of view. It aims to reveal the mental images Russian and English speakers associate with the word “hands”. For this purpose, a wide range of linguistic sources such as language corpora and the results of an associative ...

Added: October 23, 2013

Principles of Citizen Science in Open Educational Projects Based on Open Data

Maksimenkova O. V., Radchenko I., , in: Proceedings of the 12th Central and Eastern European Software Engineering Conference in Russia. NY: ACM, 2016.

A phenomenon of citizen science, its features and prospects are the topic of high actuality nowadays. And it seems to be natural, that citizen science and crowdsourcing techniques penetrate to such popular area as data science. This paper considers the questions about teaching data science and the areas, which borrow the techniques from data science. ...

Added: January 12, 2017

Язык Л. Н. Толстого: корпусный подход и интроспекция

Orekhov B., Труды института русского языка им. В.В. Виноградова 2024 № 1(39) С. 67–73

The paper presents a corpus check for the series of notes by Alexander Bisk. In the mid-twentieth century, A. Bisk, an attentive reader and expert in Russian literature, who was then in exile, published an article in a journal specializing in the problems of teaching Russian to foreigners. In this article, he shares the results ...

Added: April 23, 2024

Корпусные исследования особенностей речи нестандартных говорящих ("херитажный русский")

Rakhilina E. V., Марушкина А. С., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2015 Т. XI № 1 С. 621–639

The paper presents an analysis of comparative, conditional and prepositional constructions in the speech of heritage speakers of Russian and learners of Russian as a second language on the material from the Russian Learner Corpus. ...

Added: July 25, 2015

Корпусные исследования особенностей речи нестандартных говорящих («херитажный» русский)

Rakhilina E. V., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2014

Понятие носитель языка (например, русского) ассоциируется с человеком, который может свободно выразить любую мысль, и при этом не делает ошибок. Этот упрощенный стереотип предполагает, что носители всегда придерживаются в своей речи какого-то единого стандарта и следуют стандартным правилам. В реальности, однако, далеко не все, кого мы называем носителями языка, говорят одинаково: как известно, есть «особенная» ...

Added: February 24, 2014

Verbs in aphasic discourse: data from the Russian Clinical Pear Stories Corpus

Akinina Y., Bergelson M., Khudyakova M. et al., Stem-, Spraak- en Taalpathologie 2015 Vol. 20 No. 1 P. 21–23

In the current study we present interim results of verb use analysis in two aphasic groups based on Russian CliPS (Clinical Pear Stories) data. Russian CliPS is a multimedia corpus of narratives produced by speakers with aphasia and right hemisphere damage, as well as neurologically healthy speakers of Russian. ...

Added: September 21, 2015

THE FORMATION OF A SYSTEM OF OPEN GOVERNMENT IN RUSSIA: EXPERIENCE AND PROSPECTS

Dmitrieva N., Styrin E. M., Public Administration Issues 2014 No. 5 P. 57–75

This paper analyzes the development of forms and methods of interaction between government agencies and the experts’ community, public organizations, and citizens under the influence of a whole host of factors, including a transition to networked forms of administration; the production and exchange of big data; the dynamic development of information and communication technologies; and ...

Added: March 26, 2015

Теоретическая семантика и идеографическая лексикография: Словарь. Дискурс. Корпус: тезисы докладов Всероссийской науч. конф. с международным участием. 17-18 октября 2024, Екатеринбург

Екатеринбург: Кабинетный ученый, 2024.

В сборнике представлены тезисы докладов разных научных школ, обединенные проблемами семантики и лексикографии. ...

Added: October 21, 2024

Good Intentions Exploited Badly: Contested Metaphors of Russian Patriotism

Skrynnikova I., Permyakova T. M., Pozdeeva E., Journal of Intercultural Communication Research 2022 Vol. 51 No. 4 P. 343–360

Generating and cultivating patriotic sentiments has been universally recognized as being critical for any nation. The originally sacralized Russian patriotism has evolved into an ambiguous concept due to its discrediting in the post-Soviet era. The paper claims that patriotism is an essentially contested concept, frequently employed as a promotional tool in political campaigns, with figurative language serving as a tool for ...

Added: August 27, 2021

Тренды субъективного благополучия в России: 1998-2018

Shirokanova A., Вестник Санкт-Петербургского университета. Серия 12: Социология 2020 № 1 С. 4–24

The paper reviews and analyses Russian surveys on subjective wellbeing, one of the key noneconomic indicators of social development. Two main indicators, the level of happiness and overall life satisfaction, are compared across eight international and Russian research projects, 1998-2018 (European Social Survey, European Values Study, World Values Survey, RLMSHSE, the “Eurobarometer in Russia” project, surveys by ...

Added: December 28, 2019

Evaluating Public Organizations using Open data: An Assessment Tool and Ecosystems Approach

Styrin E. M., Dmitrieva N., International Journal of Electronic Government Research 2017 Vol. 13 No. 4 P. 1–14

Information openness and stakeholders’ involvement through ICT become the driving factors of public organization change. In this paper an “ecosystem” approach is embraced to study social sphere organizations (SSO), such as hospitals, schools, and libraries. SSO report on their activities by publishing information on the Web which can be used to evaluate the effectiveness and ...

Added: January 23, 2018

Онтологический подход к интеграции информации в областях с интенсивным использованием данных

Заякин В. С., Lyadova L. N., Рабчевский Е. А., Информационные технологии 2022 Т. 28 № 10 С. 529–538

The development and support of knowledge-based systems for experts in the field of social network analysis (SNA) is complicated because of the problems of viability maintenance that inevitably emerge in data intensive domains. Largely this is the case due to the properties of semi-structured objects and processes that are analyzed by data specialists using data ...

Added: October 22, 2022

Независимая оценка получателями качества социальных услуг

Dmitrieva N., Styrin E. M., Ястребова Е. В., Вопросы государственного и муниципального управления 2017 № 2 С. 27–56

One of the key goals of modern government’s social policy is to increase the quality of social services provision and, accordingly, the satisfaction level of citizens as social services consumers. Th e choice of methods and tools in order to solve this challenge is restricted to existing resources of government agencies: budget, personnel, information technologies, ...

Added: June 21, 2017

Language Interference in Heritage Russian: Constructional Violations

Rakhilina E. V., Vyrenkova A. S., / NRU HSE. Series WP BRP "Linguistics". 2014. No. 11.

The problem of incomplete language acquisition and heritage languages is approached from several perspectives: who are heritage speakers, how are they different from native speakers and L2 learners, is heritage language a particular system? This paper aims at answering these and other questions focusing on constructional deviations in the output of heritage speakers and linguistic ...

Added: October 23, 2014

Design Patterns for a Knowledge-Driven Analytical Platform

Zayakin V.S., Lyadova L.N., Rabchevskiy E. A., Proceedings of the Institute for System Programming of the RAS 2022 Vol. 34 No. 2 P. 43–56

Abstract. The development and support of knowledge-based systems for experts in the field of social network analysis (SNA) is complicated because of the problems of viability maintenance that inevitably emerge in data intensive domains. Largely this is the case due to the properties of semi-structured objects and processes that are analyzed by data specialists using ...

Added: July 23, 2022

Цифровые гуманитарные проекты: проблемы междисциплинарности

Северина Е. М., Bonch-Osmolovskaya A. A., Бец Ю. В. et al., Гуманитарные и социальные науки 2021 Т. 88 № 5 С. 121–129

Рассматриваются междисциплинарные «цифровые практики» в гуманитарной сфере, ис - пользующие компьютерные модели и цифровые технологии в качестве научного инструментария и реализуе - мые в виде цифровых проектов. Описана работа междисциплинарных коллективов, реализующих цифровые проекты, в контексте основного принципа Digital Humanities – принципа открытых исследовательских данных (Open data), целью которого является не только размещение информации в ...

Added: December 23, 2021

Цифровой архив литературного журнала с дореформенной орфографией «Отечественные Записки» (1839-1884)

Eugeniya Z., Klyshinskiy E., Voloshina E. et al., Компьютерная лингвистика и интеллектуальные технологии 2021 Т. дополнительный № 20 С. 1239–1244

The paper describes an initial version of the digital archive of the literary magazine with the pre-reform orthography «Otechestvennye Zapiski». Today, the corpus contains 10 XML-volumes of the literary magazine (~ 2 mil. words). The web-application of the digital archive allows users to search for words and lemmas in corpus and to edit magazine’s texts ...

Added: June 6, 2022

Построение комплексного индикатора для оценки состояния российского коммерческого банка на основе структурированных и неструктурированных данных

Bogdanova T., Zhukova L., В кн.: Системное моделирование социально-экономических процессов: труды 43-ей международной научной школы-семинара. Воронеж: Истоки, 2020. Гл. 9 С. 481–488.

The paper describes an approach to constructing a comprehensive indicator for assessing the state of the bank, other than satisfactory, including both homogeneous structured data on the financial condition of the bank, and not structured, from "open" data sources. To construct the components of a universal indicator, it is proposed to use the methods of ...

Added: February 16, 2021

What contributes to discourse coherence? Evidence from Russian speakers with and without aphasia

Linnik A., Bastiaanse R., Khudyakova Mariya, Stem-, Spraak- en Taalpathologie 2015 Vol. 20 P. 107–110

Added: September 23, 2015

Грамматика ошибок и грамматика конструкций: «эритажный» («унаследованный») русский язык

Полинская М., Rakhilina E. V., Vyrenkova A. S., Вопросы языкознания 2014 № 3 С. 3–19

The article gives an overview of mistakes made by a peculiar type of speakers – children of emigrants from Russia who grew up in a foreign linguistic environment and inherited their Russian from their parents. English tradition refers to this variety of Russian as heritage Russian. The study is based on the data from the ...

Added: February 24, 2014