Data Representation in Machine Learning-Based Sentiment Analysis of Customer Reviews

I. Shamshurin

?

Data Representation in Machine Learning-Based Sentiment Analysis of Customer Reviews

P. 254–260.

Shamshurin I.

In this paper, we consider the problem of extracting opinions from natural language texts, which is one of the tasks of sentiment analysis. We provide an overview of existing approaches to sentiment analysis including supervised (Naive Bayes, maximum entropy, and SVM) and unsupervised machine learning methods. We apply three supervised learning methods (Naive Bayes, KNN, and a method based on the Jaccard index) - to the dataset of Internet user reviews about cars and report the results. When learning a user opinion on a specific feature of a car such as speed or comfort, it turns out that training on full unprocessed reviews decreases the classification accuracy. We experiment with different approaches to preprocessing reviews in order to obtain representations that are relevant for the feature one wants to learn and show the effect of each representation on the accuracy of classification.

Language: English

Full text

Keywords: «обучение с учителем»«обучение без учителя»анализ мнений метод К ближайших соседей наивный Байесов метод индекс Жаккара

In book

Pattern Recognition and Machine Intelligence. 4th International Conference, PReMI 2011, Moscow, Russia, June/July 2011. Proceedings

Issue 6744. , Berlin, Heidelberg: Springer, 2011.

Transfer Machine Learning of an Anisotropic Model

D. D. Sukhoverkhova, L. N. Shchur, Lobachevskii Journal of Mathematics 2025 Vol. 46 No. 1 P. 528–534

We investigate the possibility of extracting features of second-order phase transitions using transfer machine learning. We have performed supervised machine learning for binary classification of snapshots of the spin distribution of the isotropic Ising model. The binary classification is performed in ferromagnetic and paramagnetic phases using a known critical temperature. The trained network is used ...

Added: January 13, 2025

Latent heat estimation with machine learning

Sukhoverkhova D., Mozolenko V., Shchur L., / Series arXiv "math". 2024. No. 2411.00733.

We set out to explore the possibility of investigating the critical behavior of systems with first-order phase transition using deep machine learning. We propose a machine learning protocol with ternary classification of instantaneous spin configurations using known values of disordered phase energy and ordered phase energy. The trained neural network is used to predict whether ...

Added: November 4, 2024

Finite-size analysis in neural network classification of critical phenomena

Chertenkov V., Burovskiy E., Shchur L., Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 2023 Vol. 108 No. 3 Article L032102

We analyze the problem of supervised learning of ferromagnetic phas transitions from the statistical physics perspective. We consider two systems in two universality classes, the two-dimensional Ising model and two-dimensional Baxter-Wu model, and perform careful finite-size analysis of the results of the supervised learning of the phases of each model. We find that the variance ...

Added: September 19, 2023

Towards polynomial subgroup discovery by means of FCA?

Buzmakov A. V., , in: Eighth International Workshop “What can FCA do for Artificial Intelligence?”.: [б.и.], 2020. P. 57–68.

The goal of subgroup discovery is to find groups of objectsthat are significantly different than “average” object w.r.t. some super-vised information. It is a computational intensive procedure that tra-verses a large searching space corresponding to the set of formal con-cepts. It was recently found that a part of formal concepts, called stableconcepts, can be found ...

Added: July 10, 2021

Методы наук о данных в политических исследованиях: анализ протестной активности в социальных сетях

Stukal D., Беленков В. Е., Philippov I., Политическая наука 2021 № 1 С. 46–75

Появление и рост популярности социальных сетей, а также растущая цифровизация, проникающая в разнообразные сферы экономики и общества оказали существенное влияние на сферу политики в целом и, в частности, на процессы политической мобилизации и коммуникации. Методологический арсенал политической науки также оказался затронут указанными трансформационными процессами и начал пополняться новыми подходами и методами, предложенными в рамках недавно ...

Added: March 2, 2021

Advances in Intelligent Data Analysis XVIII (IDA 2020)

Cham: Springer, 2020.

This open access book constitutes the proceedings of the 18th International Conference on Intelligent Data Analysis, IDA 2020, held in Konstanz, Germany, in April 2020. The 45 full papers presented in this volume were carefully reviewed and selected from 114 submissions. Advancing Intelligent Data Analysis requires novel, potentially game-changing ideas. IDA’s mission is to promote ideas over performance: a ...

Added: May 17, 2020

A Simple Method to Evaluate Support Size and Non-uniformity of a Decoder-Based Generative Model

Struminsky K., Vetrov D., Lecture Notes in Computer Science 2019 Vol. 11832 P. 81–93

Theoretical analysis in [1] suggested that adversarially trained generative models are naturally inclined to learn distribution with low support. In particular, this effect is caused by the limited capacity of the discriminator network. To verify this claim, [2] proposed a statistical test based on the birthday paradox that partially confirmed the analysis. In this paper, ...

Added: April 23, 2020

Variational Autoencoder with Arbitrary Conditioning

Vetrov D., Ivanov O., , in: Proceedings of the 7th International Conference on Learning Representations (ICLR 2019).: ICLR, 2019. P. 1–25.

We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot". The features may be both real-valued and categorical. Training of the model is performed by stochastic variational Bayes. The experimental evaluation on synthetic data, ...

Added: March 13, 2020

Использование метода главных компонент для анализа надежности цепей поставок

Kuznetsov V. O., Логистика и управление цепями поставок 2018 № 4 (87) С. 27–33

One of the options for a more flexible approach to analyzing the reliability of supply chains is the principal component analysis (PCA). With a large number of variables describing supply chain, it is a difficult task to analyze the structure of variables in two-dimensional space. Within the analysis of the variables dependencies PCA allows to ...

Added: November 29, 2018

Устойчивый к шуму метод обучения вариационного автокодировщика

Figurnov M., Struminsky K., Vetrov D., Интеллектуальные системы. Теория и приложения 2017 Т. 21 № 2 С. 90–109

Variational autoencoder (VAE) is a probabilistic unsupervised method that uses deep learning. We propose a robust approach to the training of VAE using a modified likelihood function. We propose and analyze two variational lower bound objectives. The effectiveness of the method is experimentally shown by artificially introducing noise objects. ...

Added: October 18, 2017