TY - JOUR
T1 - BDA
T2 - Bi-directional attention for zero-shot learning
AU - Lee, Junseok
AU - Cao, Jinming
AU - Yin, Yifang
AU - Kim, Jihie
AU - Zimmermann, Roger
AU - Park, Seongsik
N1 - Publisher Copyright:
© 2024 Tsinghua University Press.
PY - 2025
Y1 - 2025
N2 - Zero-shot learning (ZSL) is an important and rapidly growing area of machine learning that aims to recognize new classes without prior training data. Despite its significance, ZSL has faced challenges with overfitting in embedding-based methods and limitations in traditional one-directional attention (ODA) based approaches. To bridge these gaps, this paper proposes the use of bi-directional attention (BDA) to integrate insights from both embedding and attention-based approaches. The proposed BDA system consists of a bi-directional attention network (BDAN) and a synthesized visual embedding network (SVEN) that facilitates visual-semantic interaction for ZSL classification. More specifically, the BDAN employs region self-attention (RSA), semantic synthesis attention (SSA), and visual synthesis attention (VSA) to overcome the overfitting issue in embedding methods and enhance transferability, to associate visual features with semantic property information, and to learn locally improved visual features. Extensive testing on CUB, SUN, and AWA2 datasets confirm the superiority of our proposed method over traditional approaches.
AB - Zero-shot learning (ZSL) is an important and rapidly growing area of machine learning that aims to recognize new classes without prior training data. Despite its significance, ZSL has faced challenges with overfitting in embedding-based methods and limitations in traditional one-directional attention (ODA) based approaches. To bridge these gaps, this paper proposes the use of bi-directional attention (BDA) to integrate insights from both embedding and attention-based approaches. The proposed BDA system consists of a bi-directional attention network (BDAN) and a synthesized visual embedding network (SVEN) that facilitates visual-semantic interaction for ZSL classification. More specifically, the BDAN employs region self-attention (RSA), semantic synthesis attention (SSA), and visual synthesis attention (VSA) to overcome the overfitting issue in embedding methods and enhance transferability, to associate visual features with semantic property information, and to learn locally improved visual features. Extensive testing on CUB, SUN, and AWA2 datasets confirm the superiority of our proposed method over traditional approaches.
KW - bi-directional attention (BDA)
KW - interaction
KW - transferability
KW - zero-shot learning (ZSL)
UR - https://www.scopus.com/pages/publications/105021021968
U2 - 10.26599/CVM.2025.9450401
DO - 10.26599/CVM.2025.9450401
M3 - Article
AN - SCOPUS:105021021968
SN - 2096-0433
VL - 11
SP - 983
EP - 1003
JO - Computational Visual Media
JF - Computational Visual Media
IS - 5
ER -