• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Annotated Suffix Tree Method for German Compound Splitting
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Annotated Suffix Tree Method for German Compound Splitting

P. 42–47.
Shishkova A., Artemova E.

The paper presents an unsupervised and knowledge-free ap- proach to compound splitting. Although the research is focused on Ger- man compounds, the method is expected to be extensible to other com- pounding languages. The approach is based on the annotated suffix tree (AST) method proposed and modified by Mirkin et al. To the best of our knowledge, annotated suffix trees have not yet been used for compound splitting. The main idea of the approach is to match all the substrings of a word (suffixes and prefixes separately) against an AST, determining the longest and sufficiently frequent substring to perform a candidate split. A simplification considers only the suffixes (or prefixes) and splits a word at the beginning of the selected suffix (the longest and sufficiently frequent one). The results are evaluated by precision and recall. 

Language: English
Full text
Text on another site
Keywords: аннотированное суффиксное деревоannotated suffix treecompound splitting
Publication based on the results of:
Explanation-oriented Methods of  Data Analysis for Semantically Rich Data and Their Applications (2017)

In book

CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016
CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016
Vol. 1886. , Aachen: CEUR Workshop Proceedings, 2017.
Similar publications
A Hybrid Approach to the Analysis of a Collection of Research Papers
Mirkin B., Frolov D., Vlasov A. et al., , in: Intelligent Data Engineering and Automated Learning – IDEAL 2020/ 21st International Conference, Guimaraes, Portugal, November 4–6, 2020, Proceedings, Part IIVol. 12490: Lecture Notes in Computer Science.: Cham: Springer, 2020. P. 423–433.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: November 13, 2020
A Hybrid Approach to Interpretable Analysis of Research Paper Collections
Mirkin B., Frolov D., Vlasov A. et al., , in: WIMS 2020: Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics.: Association for Computing Machinery (ACM), 2020. P. 184–189.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: August 28, 2020
Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences
Frolov D., Mirkin B., Nascimento S. et al., , in: Intelligent Data Engineering and Automated Learning – IDEAL 2019Vol. 2.: Springer, 2019. P. 3–11.
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to ...
Added: December 7, 2019
Intelligent Data Engineering and Automated Learning – IDEAL 2019
Springer, 2019.
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to ...
Added: December 7, 2019
Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster
Frolov D., Mirkin B., Nascimento S. et al., , in: Fuzzy Systems (FUZZ-IEEE), IEEE International Conference Proceedings.: IEEE, 2019. P. 1–6.
This paper presents an algorithm, ParGenFS, for generalizing, or “lifting”, a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced “head subjects” and related errors, the ...
Added: October 30, 2019
Parsimonious Generalization of Fuzzy Thematic Sets in Taxonomies Applied to the Analysis of Tendencies of Research in Data Science
Frolov D., Nascimento S., Fenner T. et al., Information Sciences 2020 Vol. 512 P. 595–615
This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks ...
Added: October 9, 2019
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree
Frolov D., Mirkin B., Nascimento S. et al., , in: Optimization of Complex Systems: Theory, Models, Algorithms and Applications.: Switzerland: Springer Publishing Company, 2020. P. 779–789.
This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” ...
Added: June 25, 2019
CONTENT 2019, The Eleventh International Conference on Creative Content Technologies
International Academy, Research, and Industry Association (IARIA), 2019.
Added: June 4, 2019
Method for Generalization of Fuzzy Sets
Frolov D., Mirkin B., Nascimento S. et al., , in: International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings* 1. Issue 11508.: Cham: Springer, 2019. P. 273–286.
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both ...
Added: June 3, 2019
International Conference on Artificial Intelligence and Soft Computing. 18th International Conference, ICAISC 2019, Zakopane, Poland, June 16–20, 2019, Proceedings
Cham: Springer, 2019.
The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research and teaching - quickly, informally, and at a high level. The two-volume set LNCS ...
Added: June 3, 2019
Comparison of String Similarity Measures for Obscenity Filtering
Artemova E., , in: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing.: Stroudsburg, PA: Association for Computational Linguistics, 2017. P. 97–101.
In this paper we address the problem of filtering obscene lexis in Russian texts. We use string similarity measures to find words similar or identical to words from a stop list and establish both a test collec- tion and a baseline for the task. Our exper- iments show that a novel string similarity measure based ...
Added: October 10, 2017
Annotated suffix trees for text clustering
Artemova E., Ilvovsky D., , in: The 3d International Workshop on Concept Discovery in Unstructured Data (CDUD 2016). Proceedings of the Third Workshop on Concept Discovery in Unstructured Data co-located with the 13th International Conference on Concept Lattices and Their Applications (CLA 2016), Moscow, Russia, July 18, 2016. CEUR Workshop ProceedingsVol. 1625.: Aachen: CEUR Workshop Proceedings, 2016. P. 25–31.
In this paper an extension of tf-idf weighting on annotated suffix tree (AST) structure is described. The new weighting scheme can be used for computing similarity between texts, which can further serve as in input to clustering algorithm. We present preliminary tests of us-ing AST for computing similarity of Russian texts and show slight im-provement ...
Added: October 26, 2016
Some thoughts on using annotated suffix trees for Natural Language Processing
Artemova E., , in: 2nd Workshop on Interactions Between Data Mining and Natural Language Processing, DMNLP 2015; Porto; Portugal; 7 September 2015Issue 1410.: Aachen: CEUR-WS, 2015. P. 5–18.
The paper defines an annotated suffix tree (AST) - a data structure used to calculate and store the frequencies of all the fragments of the given string or a collection of strings. The AST is associated with a string to text scoring, which takes all fuzzy matches into account. We show how the AST and ...
Added: October 8, 2015
Refining a Taxonomy by Using Annotated Suffix Trees and Wikipedia Resources
Artemova E., Mirkin B., Annals of Data Science 2015 Vol. 2 No. 1 P. 61–82
A step-by-step approach to taxonomy construction is presented. On the first step, the upper layer frame of taxonomy is built manually according to educational materials. On the next steps, the frame is refined at a chosen topic using the Wikipedia category tree and articles, both cleaned of noise. Our main tool in this is a ...
Added: May 27, 2015
An approach to the problem of annotation of research publications
Artemova E., , in: Proceedings of The Eighth International Conference on Web Search and Data Mining.: NY, United States of America: ACM, 2014. Ch. 58 P. 429–434.
An approach to multiple labeling research papers is explored. We develop techniques for annotating/labeling research pa- pers in informatics and computer sciences with key phrases taken from the ACM Computing Classification System. The techniques utilize a phrase-to-text relevance measure so that only those phrases that are most relevant go to the anno- tation. Three phrase-to-text ...
Added: December 8, 2014
Conceptual maps: construction over a text collection and analysis
E. Morenko, Artemova E., Mirkin B., , in: Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected PapersVol. 439.: Berlin: Springer, 2014. P. 163–169.
A method for conceptual maps construction is presented and applied to three different domains. A conceptual map is graph, where nodes stand for domain specific concepts and edges connect associated concepts. The conceptual map reveals and visualizes the logical asso- ciations between concepts, which exist in the collection of texts, used to construct the conceptual ...
Added: November 28, 2014
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit