Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer

Jun Hwa Kim, Namho Kim, Minsoo Hong, Chee Sun Won

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the most crucial elements in deeply understanding humans on a psychological level is manifested through facial expressions. The analysis of human behavior can be informed by their facial expressions, making it essential to employ indicators such as expression (EXPR), valence-arousal (VA), and action units (AU). In this paper, we introduce the method proposed in the Challenge of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW) at CVPR 2024. Our proposed method utilizes the multi-modal Aff-Wild2 dataset, which is split into visual and audio modalities. For the visual data, we extract features using the SimMIM model that was pre-trained on a diverse set of facial expression data. For the audio data, we extract features using the Wav2Vec model. Then, to fuse the extracted visual and audio features, we proposed a cascaded cross-attention mechanism in a transformer. Our approach achieved average F1 scores of 0.4652 and 0.3005 on the AU and the EXPR tracks, respectively, and an average Concordance Correlation Coefficient (CCC) of 0.5077, outperforming the baseline performance on all tracks of the ABAW6 competition. Our approach placed 5th, 6th, and 7th on the AU, the EXPR, and the VA tracks, respectively. The code used in the 6th ABAW competition is available at https://github.com/namho-96/ABAW2024.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PublisherIEEE Computer Society
Pages7870-7877
Number of pages8
ISBN (Electronic)9798350365474
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Keywords

  • ABAW
  • Cross-attention
  • Facial Analysis
  • Transformer

Fingerprint

Dive into the research topics of 'Advanced Facial Analysis in Multi-Modal Data with Cascaded Cross-Attention based Transformer'. Together they form a unique fingerprint.

Cite this