Comparative Study Of Data Clustering Algorithms And Analysis Of The Keywords Extraction Efficiency: Learner Corpus Case
We prove that the bound from the theorem on 'economic' maps is best possible. Namely, for m > n + d we construct a map from an n-dimensional simplex to an m-dimensional Euclidean space for which (and for any close map) there exists a d-dimensional plane whose preimage has cardinality not less than the upper bound \(dn + n + 1)/(m - n - d)] + d from the theorem on 'economic' maps. Bibliography: 16 titles.
There have been many reports on advances in the development of learner corpora that have made it possible to effectively use these collections of texts for the benefit of the learning process. This paper lists all possible applications in English courses taught to Bachelor students of a middle-size learner corpus REALEC, which comprises student written works supplied with expert annotation of mistakes, browsing and search options, and some optional automated tagging system. Annotation in the corpus is given by either experts (mostly, EFL instructors), or by learners themselves under the supervision of their EFL instructors. As the first point, the paper argues that when EFL methodology requires that students apply the error classification in the process of annotating their peers’ essays and gradually their own essays as well, their understanding of subtle areas of grammar, vocabulary and discourse improves, and correspondingly, the number of errors in their written works decreases. The second argument concerns the tool for the development of placement and progress tests, which makes use of sentences with mistakes made by other learners – contributors to the corpus. In the suggested design of the tests sentences are automatically extracted from the same corpus, manually divided into three echelons according to the complexity of the change required in the correction of the mistake, and then administered to learners as a way of automated measurement of their proficiency in English. The submitted test is scored automatically within minutes. The third possibility considered in the research is the possibility to supplement the corpus with the platform of trainers automatically or semi-automatically set up on the basis of frequently marked errors made by a particular group of students. In conclusion we point out the ease and usefulness of the proposed applications both for EFL instructors and English learners.
Various issues relating to the questions of learner corpus researches and their use in teaching are presented. These include the issue of a norm in corpora whether the norm should necessarily be native and what problems a native norm may present. Learners who behave differently from native speakers do not necessarily use language incorrectly as an alternative to a unique, native norm, a range of norms are available Some of these norms may be problematic if they are not selected carefully (depending on the learner corpus, the purpose of the comparison, etc.) and handled cautiously. Different choices of norms may produce different results and thus lead to different conclusions with respect to learners’ usages. Pedagogical implications of such choices are to be examined, with particular emphasis on whether all differences between the learner corpus and the reference corpus should be targeted for teaching intervention. Problems in evaluating agreement in approaches to annotation practices are considered as well.
We prove that the bound from the theorem on ‘economic’ maps is best possible. Namely, for m > n + d we construct a map from an n-dimensional simplex to an m-dimensional Euclidean space for which (and for any close map) there exists a d-dimensional plane whose preimage has cardinality not less than the upper bound ⌈(dn + n + 1)/(m − n − d)⌉ + d from the theorem on ‘economic’ maps.
This article deals with the problem of translations. It covers the history of translation in linguistics and analyzes peculiarities and role of translation in logic. Moreover, the article contains typical examples of embedding operations in terms of dierent logical theories.
The Corpus of Russian Student Texts (CoRST) is a computational and research project started in 2013 at the Linguistic Laboratory for Corpora Research Technologies at HSE. It comprises a collection of Russian texts written by students from various Russian universities. Its main research goal is to examine language deviations viewed as markers of language change. CoRST is supplied with metalinguistic, morphological and error annotation that enable to customize subcorpora and search by various error types. Its error annotation is based on the modular classification: lexis, grammar and discourse, within which most frequent error phenomena are further distinguished. In total, the error classification encompasses 39 (20 higher-level and 19 lower-level) error tags. The crucial characteristic of CoRST is that the error annotation is multi-layered. Typically, since an error section can be corrected in a few ways, it is annotated with a few error tags respectively. Moreover, the corpus provides search by two possible explanation factors – typo and construction blending. The perspectives of CoRST development have both computational and research aspects, including qualitative and statistical comparative analysis of language phenomena in CoRST and NRC.
This paper is on the classical Knotting Problem: for a given manifold N and a number m describe the set of isotopy classes of embeddings N->S^m. We study the specific case of knotted tori, i. e. the embeddings S^p x S^q -> S^m. The classification of knotted tori up to isotopy in the metastable dimension range m>p+3q/2+3/2, p<q+1, was given by A. Haefliger, E. Zeeman and A. Skopenkov. We consider the dimensions below the metastable range, and give an explicit criterion for the finiteness of this set of isotopy classes in the 2-metastable dimension:
Theorem. Assume that p+4q/3+2<m<p+3q/2+2 and m>2p+q+2. Then the set of smooth embeddings S^p x S^q -> S^m up to isotopy is infinite if and only if either q+1 or p+q+1 is divisible by 4.
Our approach to the classification is based on an analogue of the Koschorke exact sequence from the theory of link maps. This sequence involves a new beta-invariant of knotted tori. The exactness is proved using embedded surgery and the Habegger-Kaiser techniques of studying the complement.