Abstract
Human-Machine Interaction aims for natural interactions and, to accomplish this, recognizing the user’s emotional state is crucial. The field of emotion recognition and modelling has predominantly employed static machine learning approaches despite the fact that emotions are dynamic processes that evolve in time. Appraisal models highlight the dynamic character of emotions and combining them with dynamic modelling approaches enhances interpretability. After reviewing related literature on emotion recognition from peripheral physiological signals, this work aims to model two emotion dimensions — intensity and quality – as polar coordinates derived from the Geneva Emotion Wheel (GEW). We did so by taking information from galvanic skin response, heart rate, and respiration to feed dynamic models that account for the signals’ time history. More specifically, we chose nonlinear autoregressive exogeneous (NARX) models, instead of end-to-end deep learning approaches that are challenging to interpret. In our experimental setup, different emotions were elicited by images while physiological signals were recorded. Unlike conventional studies assessing the participants’ subjective feeling post stimulus, we assessed the subjective feeling in real time with a polar device based on the GEW. The reported subjective feeling served as ground-truth. Besides that, we computed distinct physiological features based on the recorded physiological signals. We used these features to train intrasubject intensity models (for distinct qualities) and quality models (for distinct intensity levels) together with a genetic algorithm, obtaining the optimized NARX parameters and, thus, the optimal physiological features. Additionally, intersubject models were derived from top-ranked intrasubject physiological features, obtaining consistent features and parameters for intersubject intensity estimation models. Our results showed that both intrasubject and intersubject NARX models outperformed traditional sliding-window linear regressions, emphasizing the benefits of considering emotion dynamics. Nonetheless, some NARX parameters were highly dependent on the training and test sets. In the prediction of emotion intensity using NARX models, we observed the importance of heart rate time history. Given this consistency across emotion qualities, we further investigated the potential of estimating intensity independently of quality. Heart rate consistently revealed to be meaningful in such predictions. In the case of intersubject NARX quality estimation models, the results were not stable to draw strong conclusions, but both galvanic skin response and respiration-based features seem to play a role in the prediction of emotion quality evolution for different levels of intensity. Moreover, these models revealed a high temporal delay between alterations in respiration rate and the perception of emotion quality during emotional states of medium intensity. Ultimately, we would like to highlight that our approach enabled interpretation via an appraisal model (the Scherer’s Component Process Model), providing insights into the relationship between the dynamics of different physiological features and perceived emotional state.