?
Fine-Tuning Transformers: Vocabulary Transfer
Cornell University
,
2021.
Samenko I., Tikhonov A., Kozlovskii B., Yamshchikov I. P.
Transformers are responsible for the vast majority of recent advances in natural language processing. The majority of practical natural language processing applications of these models is typically enabled through transfer learning. This paper studies if corpus-specific tokenization used for fine-tuning improves the resulting performance of the model. Through a series of experiments, we demonstrate that such tokenization combined with the initialization and fine-tuning strategy for the vocabulary tokens speeds up the transfer and boosts the performance of the fine-tuned model. We call this aspect of transfer facilitation vocabulary transfer.
M. : Russian State University for the Humanitie, 2019
The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...
Added: June 12, 2019
Cham : Springer, 2019
Intelligent Systems Conference (IntelliSys) 2018 is the fourth research conference in the series. This conference is a part of SAI conferences being held since 2013. The conference series has featured keynote talks, special sessions, poster presentation, tutorials, workshops, and contributed papers each year.
The conference focus on areas of intelligent systems and artificial intelligence (AI) and ...
Added: August 29, 2018
Krylov V., Krylov S., Жигалов Г. М., Journal of Physics: Conference Series 2019 Vol. 1405(1) No. DOI: 10.1088/1742-6596/1405/1/012011б
In the paper the case is studied then semiotic signs can be represented as language constructs in the same language as the text for the interpretation. The goal is to obtain estimates of the depth of interpretability with the respect to each of the signs by finding the projections of the narrative on these language ...
Added: June 28, 2021
S.D. Kuznetsov, D.Yu. Turdakov, Астраханцев Н. А. et al., Programming and Computer Software 2014 Vol. 40 No. 5 P. 288-295
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and ...
Added: November 26, 2017
Lanin V., Научно-технический вестник Поволжья 2014 № 6 С. 197-199
The paper describes an approach to the implementation of a system that automates the processing of design documentation. Documentation analysis is based on natural language processing, specially developed object-oriented language and ontological resources. As a result, the system highlights linkages between the documents and its semantic indexing. Users get an opportunity of easily navigate through ...
Added: December 13, 2014
Ekaterinburg : CEUR Workshop Proceedings, 2014
AIST'2014 is an international data science conference on Analysis of Images, Social Networks, and Texts. Traditionally, the conference is held annually in Yekaterinburg, Russia. The conference is intended for computer scientists and practitioners whose research interests involve Internet mathematics and other related fields of data science.
LIST OF TOPICS (NON EXHAUSTIVE)
Applications of Data Mining and Machine ...
Added: August 28, 2014
Switzerland : Springer, 2015
This book constitutes the refereed proceedings of the 6th Conference on Knowledge Engineering and the Semantic Web, KESW 2015, held in Moscow, Russia, in September/October 2015. The 17 revised full papers presented together with 6 short system descriptions were carefully reviewed and selected from 35 submissions. The papers address research issues related to semantic web, ...
Added: September 16, 2015
Chepovskiy A., М. : Национальный открытый университет «ИНТУИТ», 2015
В монографии рассмотрены различные математические модели для решения практических задач обработки текстов на естественных языках. Предлагаются решения проблем, возникающих при организации индексации и последующего поиска данных. Методы компьютерной лингвистики применяются для прикладных исследований. Предназначена для разработчиков информационных систем, специалистов в области компьютерной лингвистики. ...
Added: May 23, 2015
Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5-21
The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...
Added: December 11, 2019
Chistyakova K., Kazakova Tatiana, / НИУ ВШЭ. Series WP BRP "Linguistics". 2023. No. 115.
The problem of language models’ interpretation is extensively inspected, but no universal answers have been found. Our study offers to combine widely accepted probing methods with a novel approach to a neural network under investigation. We propose to break grammatical forms on the pre-training step in order to get two "sibling" models, as it casts ...
Added: November 29, 2023
Smetanin S., IEEE Access 2020 Vol. 8 P. 110693-110719
Sentiment analysis has become a powerful tool in processing and analysing expressed opinions on a large scale. While the application of sentiment analysis on English-language content has been widely examined, the applications on the Russian language remains not as well-studied. In this survey, we comprehensively reviewed the applications of sentiment analysis of Russian-language content and ...
Added: June 24, 2020
Bartunov S., Кондрашкин Д. А., Osokin A. et al., / Arxiv.org. Series arXiv:1502.07257 "Computation and language". 2015.
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to ...
Added: November 5, 2015
Stroudsburg, PA : Association for Computational Linguistics, 2016
Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of papers reporting results on the MUC/ACE/OntoNotes corpora. Given ...
Added: December 7, 2016
Cham : Springer, 2020
This book focuses on the core areas of computing and their applications in the real world. Presenting papers from the Computing Conference 2020 covers a diverse range of research areas, describing various detailed techniques that have been developed and implemented.
The Computing Conference 2020, which provided a venue for academic and industry practitioners to share new ...
Added: July 7, 2020
М. : Издательский центр «Российский государственный гуманитарный университет», 2019
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...
Added: October 16, 2019
Berlin : Springer, 2014
This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...
Added: November 13, 2014
Tikhonov A., Yamshchikov I. P., / Cornell University. Series Computer Science "arxiv.org". 2021.
Chekhov's gun is a dramatic principle stating that every element in a story must be necessary, and irrelevant elements should be removed. This paper presents a new natural language processing task — Chekhov's gun recognition or (CGR) — recognition of entities that are pivotal for the development of the plot. Though similar to classical Named Entity Recognition ...
Added: December 3, 2021
Springer, 2022
“Data Analytics and Management in Data Intensive Domains” conference (DAMDID) is planned as a multidisciplinary forum of researchers and practitioners from various domains of science and research promoting cooperation and exchange of ideas in the area of data analysis and management in data intensive domains. Approaches to data analysis and management being developed in specific data intensive domains of X-informatics (such as X = astro, bio, chemo, geo, medicine, neuro, physics, ...
Added: August 30, 2021
Denis Turdakov, Astrakhantsev N., Fedorenko D., Programming and Computer Software 2015 Vol. 41 No. 6 P. 336-349
Applications related to domain specific text processing often use glossaries and ontologies, and the main step of such resource construction is term recognition. This paper presents a survey of existing definitions of the term and its linguistic features, formulates the task definition for term recognition, and analyzes presently-available methods for automatic term recognition, such as ...
Added: August 26, 2016
Springer, 2021
This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic.
The 27 full papers and 4 short papers presented ...
Added: October 7, 2020
Денис Турдаков, Астраханцев Н. А., Недумов Я. Р. et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421-438
he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...
Added: November 6, 2017
Association for Computational Linguistics, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ...
Added: August 31, 2021
Springer, 2015
16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I
ISBN: 978-3-319-18110-3 (Print) 978-3-319-18111-0 (Online) ...
Added: April 23, 2015
Aachen : CEUR Workshop Proceedings, 2017
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With ...
Added: June 25, 2017