Lecture Notes in Computer Science
This book constitutes the refereed proceedings of the 20th International Symposium on String Processing and Information Retrieval, SPIRE 2013, held in Jerusalem, Israel, in October 2013. The 18 full papers, 10 short papers were carefully reviewed and selected from 60 submissions. The program also featured 4 keynote speeches. The following topics are covered: fundamentals algorithms in string processing and information retrieval; SP and IR techniques as applied to areas such as computational biology, DNA sequencing, and Web mining.
We revisit two variants of the problem of computing minimal discriminating words studied in . Given a pattern P and a threshold d, we want to report (i) all shortest extensions of P which occur in less than d documents, and (ii) all shortest extensions of P which occur only in d selected documents. For the rst problem, we give an optimal solution with constant time per output word. For the second problem, we propose an algorithm with running time O(jPj + d (1 + output)) improving the solution of .
This paper discusses approaches to the selection of keywords, used for information extraction of event frames. In particular, the innovation event is associated with different lexical items in different areas of knowledge. The paper evaluated the contribution of general and specific vocabulary in the representation of the frame in a particular subject area.
The algorithm to adapt lexical complexity in the news article which can be used as materials for learning language presented in the paper. We consider words substitution retrieval according to wordnet-based and corpus-based semantic relatedness. Two corpus-based similarity measures empirically tested: Vector Space Model and Distributional Semantic Model. This language processing algorithm has created as a client-server application. It retrieves appropriate text from Web-resource. Next it performs adaptation procedure
The 4th International Conference on Educational Data Mining (EDM 2011) brings together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large datasets to answer educational research questions. The conference, held in Eindhoven, The Netherlands, July 6-9, 2011, follows the three previous editions (Pittsburgh 2010, Cordoba 2009 and Montreal 2008), and a series of workshops within the AAAI, AIED, EC-TEL, ICALT, ITS, and UM conferences. The increase of e-learning resources such as interactive learning environments, learning management systems, intelligent tutoring systems, and hypermedia systems, as well as the establishment of state databases of student test scores, has created large repositories of data that can be explored to understand how students learn. The EDM conference focuses on data mining techniques for using these data to address important educational questions.
Formal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classication, introduced and detailed in the book of Bernhard Ganter and Rudolf Wille, \Formal Concept Analysis", Springer 1999. The area came into being in the early 1980s and has since then spawned over 10000 scientic publications and a variety of practically deployed tools. FCA allows one to build from a data table with objects in rows and attributes in columns a taxonomic data structure called concept lattice, which can be used for many purposes, especially for Knowledge Discovery and Information Retrieval. The \Formal Concept Analysis Meets Information Retrieval" (FCAIR) workshop collocated with the 35th European Conference on Information Retrieval (ECIR 2013) was intended, on the one hand, to attract researchers from FCA community to a broad discussion of FCA-based research on information retrieval, and, on the other hand, to promote ideas, models, and methods of FCA in the community of Information Retrieval. This volume contains 11 contributions to FCAIR workshop (including 3 abstracts for invited talks and tutorial) held in Moscow, on March 24, 2013. All submissions were assessed by at least two reviewers from the program committee of the workshop to which we express our gratitude. We would also like to thank the co-organizers and sponsors of the FCAIR workshop: Russian Foundation for Basic Research, National Research University Higher School of Economics, and Yandex.
This paper describes enhanced algorisms of lexical optimization query. These algorisms detect and remove redundant conditions from query restriction to simplify it. The paper also presents results of implementation of these optimization techniques and those effects on query processing speed. The paper includes four sections. The first section (Introduction) provides general context of the paper. The second section describes three proposed algorithms of lexical query optimization. The first one is the algorithm of absorption. This algorithm allows to find and remove a wide set of conditions that are redundant but are not equal textually even after standardization of whole restriction expression. The second algorithm is an adaptation of well-known Quin-McCluskey algorithm initially designed for minimization of Boolean functions. The last algorithm of lexical query optimization is based on techniques for optimization of systems of linear inequalities. The third section of the paper discusses results of efficiency evaluation of the proposed algorithms. The forth section concludes the paper.
Name matching is a key component of systems for entity resolution or record linkage. Alternative spellings of the same names are a com- mon occurrence in many applications. We use the largest collection of genealogy person records in the world together with user search query logs to build name matching models. The procedure for building a crowd-sourced training set is outlined together with the presentation of our method. We cast the problem of learning alternative spellings as a machine translation problem at the character level. We use in- formation retrieval evaluation methodology to show that this method substantially outperforms on our data a number of standard well known phonetic and string similarity methods in terms of precision and re- call. Additionally, we rigorously compare the performance of standard methods when compared with each other. Our result can lead to a significant practical impact in entity resolution applications.
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical efficiency of different algorithms for computing frequent closed sets of attributes is compared.
Formal Concept Analysis (FCA) is an unsupervised clustering technique and many scientific papers are devoted to applying FCA in Information Retrieval (IR) research. We collected 103 papers published between 2003-2009 which mention FCA and information retrieval in the abstract, title or keywords. Using a prototype of our FCA-based toolset CORDIET, we converted the pdf-files containing the papers to plain text, indexed them with Lucene using a thesaurus containing terms related to FCA research and then created the concept lattice shown in this paper. We visualized, analyzed and explored the literature with concept lattices and discovered multiple interesting research streams in IR of which we give an extensive overview. The core contributions of this paper are the innovative application of FCA to the text mining of scientific papers and the survey of the FCA-based IR research.
Doctoral students were invited to the Doctoral Consortium held in conjunction with the main conference of ECIR 2013. The Doctoral Consortium aimed to provide a constructive setting for presentations and discussions of doctoral students’ research projects with senior researchers and other participating students. The two main goals of the Doctoral Consortium were: 1) to advise students regarding current critical issues in their research; and 2) to make students aware of the strengths and weakness of their research as viewed from different perspectives. The Doctoral Consortium was aimed for students in the middle of their thesis projects; at minimum, students ought to have formulated their research problem, theoretical framework and suggested methods, and at maximum, students ought to have just initiated data analysis. The Doctoral Consortium took place on Sunday, March 24, 2013, at the ECIR 2013 venue, and participation is by invitation only. The format was designed as follows: The doctoral students presents summaries of their work to other participating doctoral students and the senior researchers. Each presentation was followed by a plenary discussion, and individual discussion with one senior advising researcher. The discussions in the group and with the advisors were intended to help the doctoral student to reflect on and carry on with their thesis work.
We consider certain spaces of functions on the circle, which naturally appear in harmonic analysis, and superposition operators on these spaces. We study the following question: which functions have the property that each their superposition with a homeomorphism of the circle belongs to a given space? We also study the multidimensional case.
We consider the spaces of functions on the m-dimensional torus, whose Fourier transform is p -summable. We obtain estimates for the norms of the exponential functions deformed by a C1 -smooth phase. The results generalize to the multidimensional case the one-dimensional results obtained by the author earlier in “Quantitative estimates in the Beurling—Helson theorem”, Sbornik: Mathematics, 201:12 (2010), 1811 – 1836.
We consider the spaces of function on the circle whose Fourier transform is p-summable. We obtain estimates for the norms of exponential functions deformed by a C1 -smooth phase.
Let k be a field of characteristic zero, let G be a connected reductive algebraic group over k and let g be its Lie algebra. Let k(G), respectively, k(g), be the field of k- rational functions on G, respectively, g. The conjugation action of G on itself induces the adjoint action of G on g. We investigate the question whether or not the field extensions k(G)/k(G)^G and k(g)/k(g)^G are purely transcendental. We show that the answer is the same for k(G)/k(G)^G and k(g)/k(g)^G, and reduce the problem to the case where G is simple. For simple groups we show that the answer is positive if G is split of type A_n or C_n, and negative for groups of other types, except possibly G_2. A key ingredient in the proof of the negative result is a recent formula for the unramified Brauer group of a homogeneous space with connected stabilizers. As a byproduct of our investigation we give an affirmative answer to a question of Grothendieck about the existence of a rational section of the categorical quotient morphism for the conjugating action of G on itself.
Let G be a connected semisimple algebraic group over an algebraically closed field k. In 1965 Steinberg proved that if G is simply connected, then in G there exists a closed irreducible cross-section of the set of closures of regular conjugacy classes. We prove that in arbitrary G such a cross-section exists if and only if the universal covering isogeny Ĝ → G is bijective; this answers Grothendieck's question cited in the epigraph. In particular, for char k = 0, the converse to Steinberg's theorem holds. The existence of a cross-section in G implies, at least for char k = 0, that the algebra k[G]G of class functions on G is generated by rk G elements. We describe, for arbitrary G, a minimal generating set of k[G]G and that of the representation ring of G and answer two Grothendieck's questions on constructing generating sets of k[G]G. We prove the existence of a rational (i.e., local) section of the quotient morphism for arbitrary G and the existence of a rational cross-section in G (for char k = 0, this has been proved earlier); this answers the other question cited in the epigraph. We also prove that the existence of a rational section is equivalent to the existence of a rational W-equivariant map T- - - >G/T where T is a maximal torus of G and W the Weyl group.