?
Neural network model for video-based facial expression recognition in-the-wild on mobile devices
In this paper, we propose to solve the problem of facial expression recognition in videos by implementing a two-stage procedure, in which, firstly, facial features are extracted from all frames using an EfficientNet-based model. The latter is pre-trained to identify facial attributes and further fine-tuned on an external dataset for the emotion classification task. Secondly, multiple statistical functions are calculated and used in the aggregation process to create a single video representation. Furthermore, we propose a new technique for sequence, frame-level attention models, and 1D convolutions by concatenating the output of a statistical function with the facial features. It was experimentally shown that the proposed approach leads to state-of-the-art results on the AFEW 8.0 dataset.