Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases
This paper addresses the problem of insufficient performance of statistical classification with the medium-sized database (thousands of classes). Each object is represented as a sequence of independent segments. Each segment is defined as a random sample of independent features with the distribution of multivariate exponential type. To increase the speed of the optimal Kullback-Leibler minimum information discrimination principle, we apply the clustering of the training set and an approximate nearest neighbor search of the input object in a set of cluster medoids. By using the asymptotic properties of the Kullback-Leibler divergence, we propose the maximal likelihood search procedure. In this method the medoid to check is selected from the cluster with the maximal joint density (likelihood) of the distances to the previously checked medoids. Experimental results in image recognition with artificially generated dataset and Essex facial database prove that the proposed approach is much more effective, than an exhaustive search and the known approximate nearest neighbor methods from FLANN and NonMetricSpace libraries.