Four electronic corpora created in 2011 within the framework of the “Corpus Linguistics: the Albanian, Kalmyk, Lezgian, and Ossetic Languages” Program of Fundamental Research of the RAS are presented. The interface and functionalities of these corpora are described, engineering problems to be solved in their creation are elucidated, and the promises of their development are discussed. A particular emphasis is made on the compilation of dictionaries and automatic grammatical markup of the corpora.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The paper discusses sociolinguistic implementations of statistical analysis of the spoken subcorpus of the Russian National Corpus. Given the considerable size of the corpus (about 10 mln tokens), an analysis of co-variation of various linguistic parameters with one of the few sociolinguistic parameters available – the speaker’s gender – may give rich and interesting results. One specific example of co-variation is considered in detail: the mean length of the utterance (in tokens). Comparing this parameter in public communication shows statistically significant difference between the speech of men and women (men talk more), while the same difference is absent in private communication. Another important parameter is the gender of the addressee. Again, co-variation is quite different in public and private discourse. In private communication, the utterances are longer when addressing someone of the same sex, the difference between men and women is not statistically significant. In public communication, the utterances are longer when addressing a woman, whether the speaker herself is a man or woman. These conclusions are consistent with the results of sociolinguistic gender studies obtained elsewhere and by other methods. Linguistic difference between men and women are not absolute but depend on the communicative situation (public vs. private). Public discourse is a playground for linguistic competition in which men are the winning party. In private discourse, competition dissolves.
We consider certain spaces of functions on the circle, which naturally appear in harmonic analysis, and superposition operators on these spaces. We study the following question: which functions have the property that each their superposition with a homeomorphism of the circle belongs to a given space? We also study the multidimensional case.
We consider the spaces of functions on the m-dimensional torus, whose Fourier transform is p -summable. We obtain estimates for the norms of the exponential functions deformed by a C1 -smooth phase. The results generalize to the multidimensional case the one-dimensional results obtained by the author earlier in “Quantitative estimates in the Beurling—Helson theorem”, Sbornik: Mathematics, 201:12 (2010), 1811 – 1836.
We consider the spaces of function on the circle whose Fourier transform is p-summable. We obtain estimates for the norms of exponential functions deformed by a C1 -smooth phase.