A Joint Approach to Compound Splitting and Idiomatic Compound Detection

Krotova I.; Aksenov S.; E. Artemova

АБВ
АБВ
АБВ

Обычная версия сайта

Priority areas

by year

Subject

News

July 15, 2024

Meditation Can Cause Increased Tension in the Body

Researchers at the HSE Centre for Bioelectric Interfaces have studied how physiological parameters change in individuals who start practicing meditation. It turns out that when novices learn meditation, they do not experience relaxation but tend towards increased physical tension instead. This may be the reason why many beginners give up on practicing meditation. The study findings have been published in Scientific Reports.

July 15, 2024

Exploring Research Prospects and Collaboration Opportunities: Discussion of Strategic Projects Held at HSE University

On June 26 and 27, HSE University at Pokrovka hosted a discussion of potential partnerships between the university's research teams and strategic projects under the Priority 2030 programme. During the two-day session, the heads of strategic projects presented the main areas of their ongoing research, highlighted some of the key results achieved, and shared their vision for involving new researchers in the projects.

July 12, 2024

HSE Scientists Create Imperceptible but Robust Digital Watermark

HSE scientists have developed an algorithm to protect digital images, significantly enhancing the security of multimedia data on the internet. The algorithm embeds watermarks into images; these watermarks are invisible to the human eye and capable of resisting various attacks. The results of the study have been published in Computers and Electrical Engineering.

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications

?

A Joint Approach to Compound Splitting and Idiomatic Compound Detection

P. 4410-4417.

Krotova I., Aksenov S., Artemova E.

Applications such as machine translation, speech recognition, and information retrieval require efficient handling of noun compounds as they are one of the possible sources for out of vocabulary words. In-depth processing of noun compounds requires not only splitting them into smaller components (or even roots) but also the identification of instances that should remain unsplitted as they are of idiomatic nature. We develop a two-fold deep learning-based approach of noun compound splitting and idiomatic compound detection for the German language that we train using a newly collected corpus of annotated German compounds. Our neural noun compound splitter operates on a sub-word level and outperforms the current state of the art by about 5%.

Language: English

Full text

Text on another site

Keywords: German compound char-rnn символьные модели разбиение сложных слов

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Proceedings of The 12th Language Resources and Evaluation Conference

Vol. 12. , European Language Resources Association (ELRA), 2020