Роль общей и специфической лексики при извлечении информации из текста на примере анализа события «Ввод новых технологий»

В. П. Клинцов; А. А. Бонч-Осмоловская; И. О. Кузнецов; Ю. С. Акинина; С. Ю. Толдова

?

Роль общей и специфической лексики при извлечении информации из текста на примере анализа события «Ввод новых технологий»

Вестник Новосибирского государственного университета. Серия: Информационные технологии. 2012. Т. 10. № 4. С. 74–80.

Klintsov V., Bonch-Osmolovskaya A. A., Kuznetsov I., Akinina Y., Toldova S.

This paper discusses approaches to the selection of keywords, used for information extraction of event frames. In particular, the innovation event is associated with different lexical items in different areas of knowledge. The paper evaluated the contribution of general and specific vocabulary in the representation of the frame in a particular subject area.

Language: Russian

Full text

Keywords: компьютерная лингвистика извлечение информации information retrieval computer linguistics automatic text analysis frame model of the event автоматический анализ текста фреймовая модель события

SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval

Liakhnovich K., Lashinin O., Babkin A. et al., Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 2025 P. 2754–2758

Relevance and diversity are critical objectives in modern information retrieval (IR), particularly in recommender systems. Achieving a balance between relevance (exploitation) and diversity (exploration) optimizes user satisfaction and business goals such as catalog coverage and novelty. While existing post-processing reranking methods address this trade-off, they usually rely on greedy strategies, leading to suboptimal outcomes for ...

Added: February 3, 2026

Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста

П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.

В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...

Added: January 30, 2026

Дискурсивные возможности больших языковых моделей при решении задач генерации новых текстов

Mylnikova A., Гасимов А. Р., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 9 С. 33–38

На основе изучения функционирования больших языковых моделей (LLMs) и специфических характеристик машинной обработки дискурса показано применение экспериментального метода компьютерного и лингвистического анализа для статистического исследования и интерпретации лингвистических характеристик текстов. В качестве материалов исследования использован лингвистический корпус текстов Brown, а также корпуса искусственно сгенерированных текстов с применением Claude Sonnet 3.7 и Grok-3. В механизмах обработки ...

Added: November 19, 2025

Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”

Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223

Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...

Added: October 19, 2025

Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)

[б.и.], 2025.

This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...

Added: October 19, 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics

Wien: Association for Computational Linguistics, 2025.

Added: August 26, 2025

О разработке подхода к автоматизированному сбору и интеллектуальной обработке данных с применением методов веб-скрейпинга и больших языковых моделей (на примере задачи по извлечению оценок уровней готовности технологий)

Grozovskiy F., Loginova I., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 8 С. 27–36

Предлагается подход к автоматизированному извлечению и структурированию информации из текста, сочетающий веб-скрейпинг для сбора данных из онлайн-источников и большую языковую модель для их последующей интеллектуальной обработки. В качестве объекта исследования выбраны тексты новостных публикаций об уровнях готовности технологий с сайта CNews для апробации разработанной методики в рамках конкретной предметной области. Точность выделения моделью оценок технологической ...

Added: August 11, 2025

Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part I

Springer, 2025.

The five-volume set LNCS 15572, 15573, 15574, 15575 and 15576 constitutes the refereed conference proceedings of the 47th European Conference on Information Retrieval, ECIR 2025, held in Lucca, Italy, during April 6–10, 2025. The 52 full papers, 11 findings, 42 short papers and 76 papers of other types presented in these proceedings were carefully reviewed and selected from 530 submissions. The accepted papers ...

Added: April 17, 2025

Advances in Information Retrieval: 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part IV

Springer, 2025.

Added: April 10, 2025

Linking sequence patterns and functionality of alpha-helical antimicrobial peptides

Eliseev I., Terterov I., Yudenko A. et al., Bioinformatics 2019 Vol. 35 No. 16 P. 2713–2717

Motivation: The rational design of antimicrobial peptides (AMPs) with increased therapeutic potential requires deep understanding of the determinants of their activities. Inspired by the computational linguistic approach, we hypothesized that sequence patterns may encode the functional features of AMPs. Results: We found that α-helical and β-sheet peptides have non-intersecting pattern sets and therefore constructed new sequence ...

Added: January 20, 2025

Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов

Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164

The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...

Added: January 18, 2025

Findings of the Association for Computational Linguistics: NAACL 2024

Association for Computational Linguistics, 2024.

Added: November 24, 2024

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. European Conference, ECML PKDD 2024, Vilnius, Lithuania, September 9–13, 2024, Proceedings, Part X. LNCS, volume 14950

Cham: Springer, 2024.

This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024. ...

Added: November 22, 2024

Субъективная трудность текстов виртуального тура по Эрмитажу: пилотное исследование

Колмогорова П. А., Куликова Е. Р., Человек: образ и сущность. Гуманитарные аспекты 2025 № 2(62) С. 139–155

В статье обсуждается вопрос оценки трудности текстов, сопровождающих виртуальный тур по Главному музейному комплексу Государственного Эрмитажа. Методика оценки трудности, в отличие от сложности как более объективной, поддающейся параметризации характеристики текста, представляется открытым вопросом. В статье описываются результаты пилотного эксперимента, в котором информанты оценивали тексты, выделяя и комментируя фрагменты, вызывающие затруднения. Анализ показал, что наиболее частыми ...

Added: November 8, 2024

Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)

Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54

The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...

Added: November 8, 2024

Teasing apart time reference-related encoding and retrieval deficits in aphasia: evidence from Greek, Russian, Italian and English

Fyndanis V., Burgio F., Buivolova O. et al., Aphasiology 2025 Vol. 39 No. 9 P. 1242–1276

Background Persons with aphasia (PWAs) are often impaired in time reference/tense production. It has been suggested that this impairment is due to encoding or/and retrieval deficits. However, to the best of our knowledge, no experimental design that enables teasing apart selective encoding and retrieval deficits has been proposed thus far. Aims This study aims at disentangling time reference-related ...

Added: November 2, 2024

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 21, дополнительный том. Москва: РГГУ, 2022. C. 1001–1190.

РГГУ, 2022.

Сборник включает 17 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2022», представляющих широкий спектр теоретических и прикладных исследований в области компьютерного моделирования естественного языка и создания новых лингвистических технологий. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: May 24, 2024

Одиннадцатая Международная конференция по компьютерной обработке тюркских языков «TurkLang 2023»

Каз.: Издательство Академии наук Республики Татарстан, 2023.

В этом году на базе Бухарского государственного университета прошла уже одиннадцатая международная конференция по компьютерной обработке тюркских языков TurkLang-2023. Предыдущие 10 конференций прошли в Астане (2013, 2022), Стамбуле (2014), Казани (2015, 2017), Бишкеке (2016), Ташкенте (2018), Симферополе (2019), Уфе (2020), Кызыле (2021). География проведения, количество представленных трудов и состав участников конференции подтверждают, что в настоящее ...

Added: March 6, 2024

Linguistic mechanisms of colour term evolution: A diachronic investigation of “Russian browns” buryj and koričnevyj

Bochkarev V. V., Shevlyakova A., Solovyev V. et al., Diachronica 2023 Vol. 40 No. 4 P. 492–531

We investigated diachrony of distributional semantics of two competing Russian colour terms (CTs) for ‘brown’, buryj (11th century) and koričnevyj (17th century), using the Russian subcorpus of Google Books Ngram (2020). Time-series analysis (1800–2019) of bigrams gauged each term’s frequencies of occurrence and changes in combinability with nouns for natural objects, artefacts, abstract concepts and figurative expressions. In frequency, koričnevyj overtook buryj in the ...

Added: February 19, 2024