Описание алгоритма по определению тональности текста на основе отзывов о ресторанах
With the popularization of social media, a vast amount of textual content with additional geo-located and time-stamped information is directly generated by human every day. Both tweet meaning and extended message information can be analyzed in a purpose of exploration of public mood variations within a certain time periods. This paper aims at describing the development of the program for public mood monitoring based on sentiment analysis of Twitter content in Russian. Machine learning (naive Bayes classifier) and natural language processing techniques were used for the program implementation. As a result, the client-server program was implemented, where the server-side application collects tweets via Twitter API and analyses tweets using naive Bayes classifier, and the client-side web application visualizes the public mood using Google Charts libraries. The mood visualization consists of the Russian mood geo chart, the mood changes plot through the day, and the mood changes plot through the week. Cloud computing services were used in this program in two cases. Firstly, the program was deployed on Google App Engine, which allows completely abstracts away infrastructure, so the server administration is not required. Secondly, the data is stored in Google Cloud Datastore, that is, the highly-scalable NoSQL document database, which is fully integrated with Google App Engine.
Sentiment analysis has become a powerful tool in processing and analysing expressed opinions on a large scale. While the application of sentiment analysis on English-language content has been widely examined, the applications on the Russian language remains not as well-studied. In this survey, we comprehensively reviewed the applications of sentiment analysis of Russian-language content and identified current challenges and future research directions. In contrast with previous surveys, we targeted the applications of sentiment analysis rather than existing sentiment analysis approaches and their classification quality. We synthesised and systematically characterised existing applied sentiment analysis studies by their source of analysed data, purpose, employed sentiment analysis approach, and primary outcomes and limitations. We presented a research agenda to improve the quality of the applied sentiment analysis studies and to expand the existing research base to new directions. Additionally, to help scholars selecting an appropriate training dataset, we performed an additional literature review and identified publicly available sentiment datasets of Russian-language texts.
In this paper, we want to introduce experimental economics to the field of data mining and vice versa. It continues related work on mining deterministic behavior rules of human subjects in data gathered from experiments. Game-theoretic predictions partially fail to work with this data. Equilibria also known as game-theoretic predictions solely succeed with experienced subjects in specific games – conditions, which are rarely given. Contemporary experimental economics offers a number of alternative models apart from game theory. In relevant literature, these models are always biased by philosophical plausibility considerations and are claimed to fit the data. An agnostic data mining approach to the problem is introduced in this paper – the philosophical plausibility considerations follow after the correlations are found. No other biases are regarded apart from determinism. The dataset of the paper “Social Learning in Networks” by Choi et al 2012 is taken for evaluation. As a result, we come up with new findings. As future work, the design of a new infrastructure is discussed.
The study was aimed to analyze advantages of the Deep Learning methods over other baseline machine learning methods using sentiment analysis task in Twitter. All the techniques were evaluated using a set of English tweets with classification on a five-point ordinal scale provided by SemEval-2017 organizers. For the implementation, we used two open source Python libraries. The results and conclusions of the study are discussed.
In an effort to make reading more accessible, an automated readability formula can help students to retrieve appropriate material for their language level. This study attempts to discover and analyze a set of possible features that can be used for single-sentence readability prediction in Russian. We test the influence of syntactic features on predictability of structural complexity. The readability of sentences from SynTagRus corpus was marked up manually and used for evaluation.