Emotion Recognition in Sound
In this paper we consider the automatic emotions recognition problem, especially the case of digital audio signal processing. We consider and verify an straight forward approach in which the classification of a sound fragment is reduced to the problem of image recognition. The waveform and spectrogram are used as a visual representation of the image. The computational experiment was done based on Radvess open dataset including 8 different emotions: “neutral”, “calm”, “happy,” “sad,” “angry,” “scared”, “disgust”, “surprised”. Our best accuracy result 71% was produced by combination “melspectrogram + convolution neural network VGG-16”.
The paper makes a brief introduction into multiple classifier systems and describes a particular algorithm which improves classification accuracy by making a recommendation of an algorithm to an object. This recommendation is done under a hypothesis that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier to each object involves here the apparatus of Formal Concept Analysis. We explain the principle of the algorithm on a toy example and describe experiments with real-world datasets.
Symbolic classifiers allow for solving classification task and provide the reason for the classifier decision. Such classifiers were studied by a large number of researchers and known under a number of names including tests, JSM-hypotheses, version spaces, emerging patterns, proper predictors of a target class, representative sets etc. Here we consider such classifiers with restriction on counter-examples and discuss them in terms of pattern structures. We show how such classifiers are related. In particular, we discuss the equivalence between good maximally redundant tests and minimal JSM-hyposethes and between minimal representations of version spaces and good irredundant tests.
This book constitutes the refereed proceedings of the 6th IAPR TC3 International Workshop on Artificial Neural Networks in Pattern Recognition, ANNPR 2014, held in Montreal, QC, Canada, in October 2014. The 24 revised full papers presented were carefully reviewed and selected from 37 submissions for inclusion in this volume. They cover a large range of topics in the field of learning algorithms and architectures and discussing the latest research, results, and ideas in these areas.
The definition of a phoneme as a fuzzy set of minimal speech units from the model database is proposed. On the basis of this definition and the Kullback-Leibler minimum information discrimination principle the novel phoneme recognition algorithm has been developed as an enhancement of the phonetic decoding method. The experimental results in the problems of isolated vowels recognition and word recognition in Russian are presented. It is shown that the proposed method is characterized by the increase of recognition accuracy and reliability in comparison with the phonetic decoding method
In this paper, we use robust optimization models to formulate the support vector machines (SVMs) with polyhedral uncertainties of the input data points. The formulations in our models are nonlinear and we use Lagrange multipliers to give the first-order optimality conditions and reformulation methods to solve these problems. In addition, we have proposed the models for transductive SVMs with input uncertainties.
The performance of machine learning methods is heavily dependent on the choice of data representation (or features) on which they are applied. The rapidly developing field of representation learning is concerned with questions surrounding how we can best learn meaningful and useful representations of data. We take a broad view of the field and include topics such as deep learning and feature learning, metric learning, compositional modeling, structured prediction, reinforcement learning, and issues regarding large-scale learning and non-convex optimization. The range of domains to which these techniques apply is also very broad, from vision to speech recognition, text understanding, gaming, music, etc.