Three-way classification for sequences of observations
This article introduces the novel technique to reduce the computation time for classifying a sequence of observations (frames), such as a video stream, where each observation is described by high-dimensional embeddings extracted by a deep neural network. By using the methodology of granular computing, an observed sequence is represented at various scales using different frame rates. The coarse-grained granule is described as an aggregation (mean pooling) of deep embeddings of an object from a few frames extracted with a low frame rate. A descriptor for a fine-grained granule is computed using the embeddings of most frames. The classifiers are learned for every granularity level. At the classification phase, the coarse-grained descriptor of the input sequence is fed into the first classifier, and the classes with high confidence scores fill a positive set from three-way decisions. The decision-making procedure is terminated at a granularity level for which the only one category is included in its positive set or the last fine-grained granule is reached. It is experimentally shown for the video-based facial expression recognition problem that our technique is up to 30 times faster than traditional processing of all frames without significant accuracy degradation.