Classifying emotions and engagement in online learning based on a single facial expression recognition neural network
In this paper, behaviour of students in the e-learning environment is analyzed. The novel pipeline is proposed based on video facial processing. At first, face detection, tracking and clustering techniques are applied to extract the sequences of faces of each student. Next, a single efficient neural network is used to extract emotional features in each frame. This network is pre-trained on face identification and fine-tuned for facial expression recognition on static images from AffectNet using a specially developed robust optimization technique. It is shown that the resulting facial features can be used for fast simultaneous prediction of students’ engagement levels (from disengaged to highly engaged), individual emotions (happy, sad, etc.,) and group-level affect (positive, neutral or negative). This model can be used for real-time video processing even on a mobile device of each student without the need for sending their facial video to the remote server or teacher’s PC. In addition, the possibility to prepare a summary of a lesson is demonstrated by saving short clips of different emotions and engagement of all students. The experimental study on the datasets from EmotiW (Emotion Recognition in the Wild) challenges showed that the proposed network significantly outperforms existing single models.