Comparative Study Of Data Clustering Algorithms And Analysis Of The Keywords Extraction Efficiency: Learner Corpus Case

Publications

?

Comparative Study Of Data Clustering Algorithms And Analysis Of The Keywords Extraction Efficiency: Learner Corpus Case

NRU HSE , 2020.

Scherbakova A.

Language: English

Publication based on the results of:

Automated Detection of Writing Inaccuracies for Students of English in Russia (2019)

Design of test-making tools for the learner corpus

Vinogradova Olga, Gerasimenko Ekaterina, , in: Corpus Linguistics 2017 Abstracts.: [б.и.], 2017. P. 406–410.

The current paper presents RETM – REALEC English Test Maker, the system that works as a tool to automatically generate tests for students on the basis of the errors that experts have marked in student works submitted to REALEC. With the help of the scripts written in Python, RETM extracts the necessary testing questions from ...

Added: June 3, 2017

Automatic dependency parsing of a learner English corpus REALEC

Lyashevskaya O., Пантелеева И. М., / NRU HSE. Series WP BRP "Linguistics". 2017.

The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The essays are a part of students' preparation for the independent final examination similar to the international English exam. While adjusting existing ...

Added: December 15, 2017

MULTI-LEVEL STUDENT ESSAY FEEDBACK IN A LEARNER CORPUS

Vinogradova O. I., Lyashevskaya O., Irina Panteleeva, , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23).: M.: -, 2017. P. 373–386.

The paper presents the results of using some computer tools and applications for the purposes of the automated and semi-automated syntactical, lexica, and error analysis of student essays in a learner corpus. The texts in the corpus were written in English by Russian learners of English. The experiment in the research consisted in comparing the ...

Added: May 30, 2017

Three embeddings of the Klein simple group into the Cremona group of rank three

Cheltsov I., Shramov K., Transformation Groups 2012 Vol. 17 No. 2 P. 303–350

We study the action of the Klein simple group PSL2(F7 ) consisting of 168 elements on two rational threefolds: the three-dimensional projective space and a smooth Fano threefold X of anticanonical degree 22 and index 1. We show that the Cremona group of rank three has at least three non-conjugate subgroups isomorphic to PSL2(F7 ). As a ...

Added: August 30, 2012

Автоматическое обнаружение и исправление деривационных ошибок в письменной речи на русском как иностранном

Vyrenkova A. S., Смирнов И. Ю., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2021 Т. 19 № 3 С. 57–68

Learner corpora serve as one of the most valuable sources of statistical data on learners' errors. For instance, data from foreign-language learners’ corpora can be used for the Second Language Acquisition research. However, corpora representativity strongly depends on the quality of its error markup, which is most frequently carried out manually and thus presents a ...

Added: September 24, 2021

Some Features of Sentiment Analysis for Russian Language Posts and Comments from Social Networks

Sidorov Nikita, Slastnikov Sergey, Journal of Physics: Conference Series 2021 Vol. 1740 P. 1–6

Sentiment analysis of different language texts is one of the very popular machine learning tasks. The complexity of its solution depends both on the characteristics of a particular language, and on the length of the evaluated texts. In our work, we consider the task of creating a sentiment analysis software tool for Russian posts and ...

Added: February 2, 2021

USE OF LEARNER CORPUS IN GENERAL ENGLISH AND ACADEMIC ENGLISH COURSES AT THE HIGHER SCHOOL OF ECONOMICS

Vinogradova O. I., , in: Conference Proceedings. The Future of Education International Conference The Future of Education, 6th edition.: Padova: libreriauniversitaria, 2016. P. 310–314.

There have been many reports on advances in the development of learner corpora that have made it possible to effectively use these collections of texts for the benefit of the learning process. This paper lists all possible applications in English courses taught to Bachelor students of a middle-size learner corpus REALEC, which comprises student written ...

Added: March 1, 2017

Keyphrase extraction from the Russian corpus on linguistics by means of KEA and RAKE algorithms

Moskvina Anna, Sokolova E., Mitrofanova O., , in: Data Analytics and Management in Data Intensive Domains. Proceedings of the XX International Conference – DAMDID/RCDL’2018, October 9-12, 2018, Moscow.: M.: FRC CSC RAS, 2018. P. 369–372.

This paper is devoted to comparison of two state-of-the-art keyphrase extraction algorithms, namely KEA based on machine learning and RAKE working with morphosyntactic patterns. Comparative study deal with peculiarities of KEA and RAKE with regard to particular research tasks. Experiments carried out on the Russian corpus on Linguistics allow to work out the best options ...

Added: September 29, 2020

Classification of knotted tori in 2-metastable dimension

Cencelj M., Repovs D., Mikhail Skopenkov, Sbornik Mathematics 2012 Vol. 203 No. 11 P. 1654–1681

This paper is on the classical Knotting Problem: for a given manifold N and a number m describe the set of isotopy classes of embeddings N->S^m. We study the specific case of knotted tori, i. e. the embeddings S^p x S^q -> S^m. The classification of knotted tori up to isotopy in the metastable dimension ...

Added: September 26, 2014

Opinion Mining for Modeling User Experience of Online Education: Sentiment Analysis and Keywords Extraction of Student Reviews

Moskvina A., Kirina M., Anastasia Gavrilyuk, , in: 2022 32nd Conference of Open Innovations Association (FRUCT).: IEEE, 2022. P. 187–195.

The paper discusses the possibilities of applying modern natural language processing technologies of opinion mining to investigate and improve the user experience of online-courses students. We analyzed 27 000 student reviews of projects within the Python programming language course. First, we applied keyword extraction algorithms as a way of semantic compression to receive a generalized ...

Added: December 9, 2022

When is the set of embeddings finite up to isotopy?

Skopenkov M., International Journal of Mathematics 2015 Vol. 26 No. 7, Article number 1550051 P. 1–28

Given a manifold N and a number m, we study the following question: is the set of isotopy classes of embeddings N → Sm finite? In case when the manifold N is a sphere the answer was given by A. Haefliger in 1966. In case when the manifold N is a disjoint union of spheres the ...

Added: September 8, 2015

What’s in a comma: Corpus study of punctuation errors and L1 interference

Pospelova K., Viklova A., Vinogradova O. I., , in: Learner Corpus Conference. LCR 2019. Book of Abstracts.: [б.и.], 2019. P. 0–20.

TBC ...

Added: November 10, 2019

Псевдосинонимичные русские конструкции у X-а Y и у X-а есть Y в контексте изучения русского языка

Apresyan V., В кн.: XVII Апрельская международная научная конференция по проблемам развития экономики и общества: в 4 кн.Кн. 4.: М.: Издательский дом НИУ ВШЭ, 2017. С. 369–379.

Использование посессивных конструкций с нулевым предикатом и со словоформой есть регулируется рядом семантических, прагматических и коммуникативных правил. Конструкция у X-а есть Y маркирована семантически, коммуникативно и сочетаемостно, она предполагает: противопоставление наличия отсутствию (У меня есть друзья), противопоставление наличия объектов одного типа объектам другого типа (У него есть хорошие студенты), рематизацию глагола (У меня ЕСТЬ мнение); связанные ...

Added: November 30, 2016

Chapter 8 Building Resilience into the Metadata-Based ETL Process Using Open Source Big Data Technologies

Panfilov P., Suleykin A., , in: Resilience in the Digital AgeVol. 12660: Lecture Notes in Computer Science.: Springer, 2021. Ch. 8 P. 139–153.

Extract-transform-load (ETL) processes play a crucial role in data analysis in real-time datawarehouse environments which demand lowlatency and high availability features for functionality. In essence, ETL- processes are becoming bottlenecks in such environments due to complexity growth, number of steps in data transformations, number of machines used for data processing and finally, increasing impact of ...

Added: February 5, 2021

Clausal complexity of expert and student writing: a corpus-based analysis of papers in social sciences

Smirnova E. A., Language Learning in Higher Education 2022 Vol. 12 No. 2 P. 453–475

Syntactic complexity has been extensively approached in the fields of corpus linguistics and academic discourse studies. However, works focusing on disciplinary variation in terms of linguistic complexity and comparison of professional and novice academic writing are scarce. Addressing these issues is likely to have important implications for EAP/ESP practitioners in terms of selection of target ...

Added: December 7, 2022

Corpus of Russian student texts: design and prospects

Zevakhina N., Dzhakupova S., , in: Материалы 21-й Международной конференции по компьютерной лингвистике "Диалог".: М.: Изд-во РГГУ, 2015.

The Corpus of Russian Student Texts (CoRST) is a computational and research project started in 2013 at the Linguistic Laboratory for Corpora Research Technologies at HSE. It comprises a collection of Russian texts written by students from various Russian universities. Its main research goal is to examine language deviations viewed as markers of language change. ...

Added: May 20, 2015

Применение учебного корпуса в преподавании темы "Confusables"

Klimova M., Overnikova D., Смилга В. К., В кн.: Пространство научных интересов: иностранные языки и межкультурная коммуникация – современные векторы развития и перспективы: сборник статей по результатам VI научной межвузовской онлайн-конференции молодых ученых 22.04.2021 г.: М.: [б.и.], 2021..

The article is devoted to teaching error-prone lexical items with the help of the learner corpus REALEC (Russian Error-Annotated Learner English Corpus). The word groups under consideration included near-synonymous numerical nouns (amount, number, quantity), near-synonymous nouns related to possibility (possibility, opportunity, ability, potential), and a pair of paronyms note and notice. Due to being the ...

Added: October 31, 2021

Good covers are algorithmically unrecognizable

Dmitry Tonkonog, Tancer M., / Series math "arxiv.org". 2012.

A good cover in R^d is a collection of open contractible sets in R^d such that the intersection of any subcollection is either contractible or empty. Motivated by an analogy with convex sets, intersection patterns of good covers were studied intensively. Our main result is that intersection patterns of good covers are algorithmically unrecognizable. More precisely, ...

Added: February 20, 2013

Architecting open education: the integrated metadata warehouse

Zykov S. V., Isheyemi O., , in: Proceedings of the CEE-SECR’17. 13th Central & Eastern European Software Engineering Conference Russia.: Association for Computing Machinery (ACM), 2017. Ch. 5 P. 1–8.

This paper proposes an integrated approach for data warehousing of the educational metadata in the area of open educational resources (OER). The aim is designing an architecture that integrates automatic metadata extraction and rule-based methods to better utilize the OER. This architecture helps synchronizing metadata from the resulting repository with versatile OER Web-based resources. Therewith, ...

Added: March 21, 2018

Обращение теоремы об “экономичных” отображениях

Bogataya S., Bogatyi S., Кудрявцева Е. А., Математический сборник 2012 Т. 203 № 4 С. 103–118

We prove that the bound from the theorem on ‘economic’ maps is best possible. Namely, for m > n + d we construct a map from an n-dimensional simplex to an m-dimensional Euclidean space for which (and for any close map) there exists a d-dimensional plane whose preimage has cardinality not less than the upper ...

Added: October 30, 2012

Русские посессивные конструкции с нулевым и выраженным глаголом: правила и ошибки

Apresyan V., Русский язык в научном освещении 2017 № 33 С. 86–116

Статья посвящена псевдосинонимичным посессивным конструкциям у Х-а есть Y и У Х-а Y. В статье рассматриваются правила их употребления и их сравнительная трудность для усвоения иностранными студентами. Исследование проводилось на корпусе ошибок RULEC. Были получены следующие результаты: ошибок в употреблении посессивных конструкций у продвинутых студентов вообще встретилось немного (не более 10 процентов), из чего ...

Added: November 30, 2016

USE OF LEARNER CORPUS IN GENERAL ENGLISH AND ACADEMIC ENGLISH COURSES AT THE HIGHER SCHOOL OF ECONOMICS

Vinogradova O. I., , in: The Future of Education, edition 6.: libreriauniversitaria, 2016. P. 310–314.

Added: May 18, 2016

Widening the scope of learner corpus research

John Benjamins Publishing Company, 2020.

The first volume will focus on the theme of learner corpora in research of the CAF (complexity, accuracy and fluency) triad, as it seems to have been a recurrent theme in many conference presentations. ...

Added: October 28, 2019

Referential coherence of academic texts: a corpus-based analysis of L2 research papers in management

Elizaveta Smirnova, Language Teaching Research 2019

This paper focuses on referential coherence which is seen as a crucial attribute of effective academic writing. I report findings from a corpus study of Russian students' use of anaphoric expressions in their research proposals which is compared to a reference corpus comprising research articles published in peer-reviewed journals. I hypothesise that learners use anaphora ...

Added: December 5, 2017