TY - GEN
T1 - CMSBERT-CLR
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
AU - Kim, Junghun
AU - Kim, Jihie
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.
AB - Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.
KW - contrastive learning
KW - fusion
KW - multimodal
KW - sentiment
KW - transforemr
UR - http://www.scopus.com/inward/record.url?scp=85140723627&partnerID=8YFLogxK
U2 - 10.1109/IJCNN55064.2022.9892785
DO - 10.1109/IJCNN55064.2022.9892785
M3 - Conference contribution
AN - SCOPUS:85140723627
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 July 2022 through 23 July 2022
ER -