SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild

Jiyong Moon, Hyeryung Jang, Seongsik Park

Research output: Contribution to journalArticlepeer-review

Abstract

Facial expression recognition in the wild (FER-W) entails classifying facial emotions in natural environments. The major challenges in FER-W stem from the complexity and ambiguity of facial images, making it difficult to curate a large-scale labeled dataset for training. Additionally, the subtle differences in emotions often reside in the fine-grained details of local facial landmarks, demanding innovative solutions to capture these crucial features efficiently. To address these issues, we employ two distinct self-supervised methods. First, we adopt a contrastive learning method to capture generalized global representations, enabling the model to understand the semantic context of facial expressions without relying on labeled data. Simultaneously, we leverage masked image modeling to focus on embedding fine-grained, local facial landmark information at the patch-level. We introduce a novel module called FaceMAE, which aims to reconstruct the masked facial patches. The semantic masking scheme is designed to preserve highly activated feature activations, allowing the encoding of crucial details of unmasked facial landmarks and their relationships within the broader facial context at the patch-level. It finally guides the backbone network to calibrate the learned global features to be attentive to facial landmarks. Our proposed method, called Simple Facial Landmark Encoding (SimFLE), significantly outperforms supervised baseline and other self-supervised methods in terms of facial landmark localization and overall performance, as demonstrated through extensive experiments across several FER-W benchmarks.

Original languageEnglish
JournalIEEE Transactions on Affective Computing
DOIs
StateAccepted/In press - 2024

Keywords

  • Contrastive learning
  • facial expression recognition
  • masked image modeling
  • self-supervised learning

Fingerprint

Dive into the research topics of 'SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild'. Together they form a unique fingerprint.

Cite this