Toxic Comments Detection in Russian

This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, ...

Added: October 15, 2013

Texterra: инфраструктура для анализа текстов

Денис Турдаков, Астраханцев Н. А., Недумов Я. Р. et al., Труды Института системного программирования РАН 2014 Т. 26 С. 421–438

he paper presents a framework for fast text analytics developed during the Texterra project. Texterra is a technology for multilingual text mining based on novel text processing methods that exploit knowledge extracted from user-generated content. It delivers a fast scalable solution for text mining without the expensive customization. Depending on use-cases Texterra could be utilized ...

Added: November 6, 2017

Анализ социальных сетей: методы и приложения

Сергей Кузнецов, Денис Турдаков, Коршунов А. В. et al., Труды Института системного программирования РАН 2014 Т. 26 № 1 С. 439–456

В статье описаны основные компоненты разработанного в ИСП РАН стека технологий для анализа пользовательских данных из социальных сетей. Особое внимание уделяется задачам, методам и приложениям анализа сетевых (социальные связи между пользователями) и текстовых (сообщения и профили пользователей) данных: определение демографических атрибутов пользователей, поиск описаний событий в корпусах сообщений, идентификация пользователей различных сетей, поиск сообществ пользователей ...

Added: November 25, 2017

Предсказания, большие данные и новые измерители: о возможности технологий компьютерной лингвистики в теоретических лингвистических исследованиях

Bonch-Osmolovskaya A. A., Вопросы языкознания 2016 № 2 С. 100–120

Статья посвящена обзору работ последних лет, в которых теоретическая исследовательская задача решается с помощью методов или инструментов, используемых в компьютерной лингвистике. В обзоре проводится подробный анализ того, как именно с помощью применения того или иного инструмента или метода можно получить новые знания о природе языка. В частности, выделяются два основных направления, развитие которых в рамках ...

Added: April 14, 2015

Pulse of the Nation: Observable Subjective Well-Being in Russia Inferred from Social Network Odnoklassniki

Sergey Smetanin, Mathematics 2022 Vol. 10 No. 16 Article 2947

Policymakers and researchers worldwide are interested in measuring the subjective well-being (SWB) of populations. In recent years, new approaches to measuring SWB have begun to appear, using digital traces as the main source of information, and show potential to overcome the shortcomings of traditional survey-based methods. In this paper, we propose the formal model for ...

Added: August 15, 2022

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086. Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Automatic Arabic Dialect Classification

Durandin O. V., Strebkov D. Y., Hilal N. R., , in: Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference “Dialogue” (2016). М.: Изд-во РГГУ, 2016. P. 1–13.

The paper presents work on automatic Arabic dialect classification and proposes machine learning classification method where training dataset consists of two corpora. The first one is a small corpus of manually dialectannotated instances. The second one contains big amount of instances that were grabbed from the Web automatically using word-marks—most unique and frequent dialectal words ...

Added: January 18, 2017

Single-sentence Readability Prediction in Russian

Karpov N., Vitugin F., Baranova J., , in: Analysis of Images, Social Networks and TextsVol. 436: 3rd International Conference on Analysis of Images, Social networks, and Texts. NY: Springer, 2014. Ch. 436 P. 91–100.

In an effort to make reading more accessible, an automated readability formula can help students to retrieve appropriate material for their language level. This study attempts to discover and analyze a set of possible features that can be used for single-sentence readability prediction in Russian. We test the influence of syntactic features on predictability of ...

Added: November 28, 2014

Методы борьбы с омонимией

Рысаков С. В., Системный администратор 2015 № 10(155) С. 92–95

The article provides a review of modern methods of morphological ambiguity resolution. We considered such methods as statistical disambiguation, Brill’s automatically generated rules, decision trees and their modifications. For the comparison, the article provides numerical results obtained on two open corpora: OpenCorpora and SynTagRus. ...

Added: November 25, 2015

Классификация текстов по жанрам при помощи алгоритмов машинного обучения

Builova N., Научно-техническая информация. Серия 2: Информационные процессы и системы 2018 № 8 С. 34–38

The problem of documents classification by genre was examined in this review. The main characteristics of the text used to recognize the genre of text were highlighted, and the most widely used algorithms of machine learning were described. The methods considered serve for the classification of scientific, technical, journalistic and artistic texts. ...

Added: March 28, 2018

Применение методов машинного обучения для решения задачи автоматической рубрикации статей по УДК

Romanov A., Ломотин К. Е., Козлова Е. С., Информационные технологии 2017 Т. 23 № 6 С. 418–423

The paper deals with the applicability of modern machine learning methods to the problem of automatic generation of UDC for scientific articles. As the classifiers, such models as artificial neural networks, logistic regression and boosting are considered. Graph algorithms and a prototype software module to generate UDC are designed. ...

Added: July 30, 2017

Классификация коннектомов на основе локальных метрик на стохастических матрицах

Иванов А. Р., Petrov D., В кн.: 40-я междисциплинарная школа-конференция "Информационные технологии и системы". [б.и.], 2016. С. 509–516.

Графовые метрики – популярный подход для клас- сификации структурных коннектомов, графов опи- сывающих структурные связи между различными участками мозга. В нашей работе мы предлагаем считать эти метрики на стохастических матри- цах случайных блужданий этих графов. При этом часть этих метрик мы предлагаем считать на логарифмах элементов матриц, чтобы сохранить физический смысл вероятностей перехода меж- ду ...

Added: December 9, 2016

Исследование точности метода градиентного бустинга со случайными поворотами.

Kitov V. V., Экономика, статистика и информатика. Вестник УМО 2016 № 4 С. 22–26

Gradient boosting method with random rotations is considered, where before training each base learner random rotation is applied to the feature space. The accuracy metric of the given method is estimated for a broad range of generated problems of binary classification. Obtained results are evaluated and recommendations given for application of this method. ...

Added: August 23, 2016

МАШИННОЕ ОБУЧЕНИЕ В ИССЛЕДОВАНИЯХ МЕДИКО-БИОЛОГИЧЕСКИХ И СОЦИАЛЬНО-ЭКОНОМИЧЕСКИХ ДАННЫХ

Buzmakov A. V., В кн.: МАШИННОЕ ОБУЧЕНИЕ В ИССЛЕДОВАНИЯХ МЕДИКО-БИОЛОГИЧЕСКИХ И СОЦИАЛЬНО-ЭКОНОМИЧЕСКИХ ДАННЫХ. СПб.: Федеральное государственное автономное образовательное учреждение высшего образования "Санкт-Петербургский политехнический университет Петра Великого", 2020. С. 284–333.

In many practical tasks it is needed to estimate an effect of treatment on individual level. For example, in medicine it is essential to determine the patients that would benefit from a certain medicament. In marketing, knowing the persons that are likely to buy a new product would reduce the amount of spam. In this ...

Added: December 7, 2021

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin: Springer, 2014.

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Pupillometry and autonomic nervous system responses to cognitive load and false feedback: an unsupervised machine learning approach

Evgeniia I. Alshanskaia, Portnova G., Liaukovich K. et al., Frontiers in Neuroscience 2024 Vol. 18 Article 1445697

Objectives: Pupil dilation is controlled both by sympathetic and parasympathetic nervous system branches. We hypothesized that the dynamic of pupil size changes under cognitive load with additional false feedback can predict individual behavior along with heart rate variability (HRV) patterns and eye movements reflecting specific adaptability to cognitive stress. To test this, we employed an ...

Added: September 2, 2024

Prediction of Drug-like Compounds Solubility in Supercritical Carbon Dioxide: A Comparative Study between Classical Density Functional Theory and Machine Learning Approaches

Makarov D., Nikolai N. Kalikin, Yury A. Budkov, Industrial & Engineering Chemistry Research 2024 Vol. 63 No. 3 P. 1589–1603

Supercritical carbon dioxide (scCO2) plays an essential role in various technological procedures, making the solubility of drugs in scCO2 a crucial aspect of the drug formulation process. This study focuses on utilizing theoretical approaches to predict the solubility of drug-like compounds in scCO2 in order to select the optimum parameters for subsequent experimental procedures. Several machine ...

Added: January 16, 2024

The bimodal corpus of Russian-Turkic bilinguals' speech (RuTuBic)

Artemenko E., Резанова З. И., Темникова И. Г. et al., Компьютерная лингвистика и интеллектуальные технологии 2019 Vol. Suppl No. 18 P. 200–210

The paper presents Russian-Turkic Bilingual Corpus (RuTuBiC) design, its basic identifying features: the aim of producing a corpus, the types of texts it contains, metatextual markup and error annotation principles, technological (IT, digital) concepts. The current state and development trends of the corpus are discussed. The corpus started as an integral part of a research project ...

Added: May 4, 2022

Formal Concept Analysis: 16th International Conference, ICFCA 2021, Strasbourg, France, June 29 – July 2, 2021, Proceedings

Springer, 2021.

This book constitutes the proceedings of the 16th International Conference on Formal Concept Analysis, ICFCA 2021, held in Strasbourg, France, in June/July 2021. The 14 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 32 submissions. The book also contains four invited contributions in full paper length. The research part ...

Added: July 10, 2021

ПРОЕКТНОЕ ПРЕДЛОЖЕНИЕ: АВТОМАТИЗИРОВАННЫЙ ПОДХОД К РЕКОМЕНДАТЕЛЬНЫМ СИСТЕМАМ

Сендерович М. А., В кн.: Межвузовская научно-техническая конференция студентов, аспирантов и молодых специалистов им. Е.В. Арменского. М.: МИЭМ НИУ ВШЭ, 2019. С. 223–224.

Данная работа посвящена актуальной теме автоматизации в машинном обучении на примере создания универсальной рекомендательной системы. В работе исследуются различные типы рекомендательных систем, акцент делается на подходы коллаборативной фильтрации. Изучаются методы автоматизации машинного обучения, на основе которых будет разработана данная рекомендательная система. ...

Added: October 31, 2020

Классификация коннектомов на основе локальных метрик на стохастических матрицах

Ivanov A., Petrov D., В кн.: Сборник статей конференции "Информационные технологии и системы" (ИТиС'16). М.: ИППИ РАН, 2016. С. 509–516.

Многие графовые метрики основаны на предположении, что веса графа представляют расстояния между вершинами, которые мы можем складывать. Если считать эти метрики для стохастических матриц случайного блуждания на графе, то физический смысл вероятностей перехода между вершинами теряется (поскольку вероятности переходов перемножаются, а не складываются). Мы предлагаем решать эту проблему использованием отрицательных логарифмов весов ребер. Используя этот ...

Added: December 15, 2016

Predictive Analytics Approach for Steel Billets Quality Control System

Belov A. V., Ekaterina A. Melekhova, Vorontsova T., , in: 2022 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS). St. Petersburg: IEEE, 2022. P. 219–223.

The paper deals with the problem of improving the quality of metal products. Nowadays destructive methods of quality control of the steel billets prevail at metallurgical enterprises. This approach to assessing the quality of the steel billets is wasteful, which increases its cost. One of the ways to reduce the cost of production of metal ...

Added: January 28, 2023

Epileptogenic high-frequency oscillations present larger amplitude both in mesial temporal and neocortical regions

Karpychev V., Balatskaya A., Utyashev N. et al., Frontiers in Human Neuroscience 2022 No. 16 Article 984306

High-frequency oscillations (HFO) are a promising biomarker for the identification of epileptogenic tissue. While HFO rates have been shown to predict seizure outcome, it is not yet clear whether their morphological features might improve this prediction. We validated HFO rates against seizure outcome and delineated the distribution of HFO morphological features. We collected stereo-EEG recordings ...

Added: October 1, 2022

Supernova search with active learning in ZTF DR3

Pruzhinskaya M., Ishida E. O., Novinskaya A. et al., Astronomy and Astrophysics 2023 Vol. 672 Article A111

Context. We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. Aims. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Methods. Our SNAD team used Active Anomaly Discovery (AAD) as ...

Added: June 6, 2023