Multiple features for multiword extraction: A learning-to-rank approach

Tutubalina E.; P. Braslavski

?

Multiple features for multiword extraction: A learning-to-rank approach

P. 782–791.

This paper describes the extraction of multiword expressions (MWEs) from corpora for inclusion in a large online lexical resource for Russian. The novelty of the proposed approach is twofold: 1) we use two corpora-the Russian National Corpus and Russian Wikipedia-in parallel and 2) employ an extended set of features based on both data sources. To combine syntactic and statistical features derived from two corpora, we experiment with several learning-to-rank (LETOR) methods that have been proven to be highly effective in information retrieval (IR) scenarios. We make use of bigrams from existing dictionaries for learning, which leads to very sparing manual annotation efforts. Evaluation shows that machine-learned rankings with rich features significantly outperform traditional corpus-based association measures and their combinations. Analysis of resulting lists supports the claim that multiple features and diverse data sources improve the quality of extracted MWEs. The proposed method is language-independent.

Language: English

Keywords: multiword expressions

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)

Вып. 15. , М.: Изд-во РГГУ, 2016.

Подчиняются ли составные конструкции закону Ципфа?

Kochetkova N. A., Klyshinskiy E., Ermakov P. D., Системный администратор 2016 № 11 С. 89–95

A dictionary of multiword expressions provides a suitable way for natural language processing. Automatically processing of natural language texts allows shortening the time that we need to construct such dictionaries and enlarge the number of expressions included into a dictionary. In this paper we show that n-gramms of natural language text are following the Zipf’s ...

Added: February 27, 2017

NAACL HLT 2015 11th Workshop on Multiword Expressions MWE 2014

NY: Association for Computational Linguistics, 2015.

Added: March 16, 2016

Automatic Detection of Stable Grammatical Features in N-Grams

Kopotev M., Pivovarova L., Kochetkova N. A. et al., , in: NAACL HLT 2013 9th Workshop on Multiword Expressions MWE 2013 Proceedings of the Workhop.: Atlanta: The Association for Computational Linguistic, 2013. Ch. 12 P. 73–81.

This paper presents an algorithm that allows the user to issue a query pattern, collects multi-word expressions (MWEs) that match the pattern, and then ranks them in a uniform fashion. This is achieved by quantifying the strength of all possible relations between the tokens and their features in the MWEs. The algorithm collects the frequency ...

Added: June 28, 2013

NAACL HLT 2013 9th Workshop on Multiword Expressions MWE 2013 Proceedings of the Workhop

Atlanta: The Association for Computational Linguistic, 2013.

This workshop is about major challenges in the overall process of MWE treatment, both from the theoretical and the computational viewpoint, focusing on original research related to the following topics: Manually and automatically constructed resources Representation of MWEs in dictionaries and ontologies MWEs in linguistic theories like HPSG, LFG and minimalism MWEs and user interaction Multilingual acquisition Multilingualism and MWE processing Models ...

Added: June 28, 2013