TY - JOUR
T1 - A maximum mean discrepancy-based autoencoder approach for dimension reduction with binary responses
AU - Moon, Youngho
AU - Lee, Yung Seop
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Korean Statistical Society 2025.
PY - 2025/12
Y1 - 2025/12
N2 - The advent of the big data era has led to a significant increase in the utilization of high-dimensional data. The challenges arising from increased data dimension are often referred to as the “curse of dimensionality.” Consequently, various dimension reduction (DR) techniques are being actively researched to address these challenges. In particular, applying DR techniques to high-dimensional datasets—such as those encountered in digital imaging, natural language processing, and genomics, often comprising hundreds of or thousands of variable—has gained considerable attention as a means to overcome the curse of dimensionality. This study proposes a novel DR technique utilizing a supervised autoencoder model for binary classification scenarios where the response variable is binary. The proposed method maps high-dimensional data into a lower-dimensional latent space learned by the autoencoder. Subsequently, it employs the Maximum Mean Discrepancy (MMD) loss function to enhance the linear separability between distinct classes within this latent representation. During the autoencoder’s training, the MMD loss encourages samples from the same class to group closely together while simultaneously maximizing the distance between samples from different classes. Recognizing the critical importance of classification performance (e.g., distinguishing between defective and non-defective items) following dimension reduction in high-dimensional data with binary response variables, we conducted a comparative evaluation using seven distinct high-dimensional datasets. Experimental results demonstrate that the proposed model achieves superior performance compared to other established DR techniques, in terms of classification accuracy and F1-score.
AB - The advent of the big data era has led to a significant increase in the utilization of high-dimensional data. The challenges arising from increased data dimension are often referred to as the “curse of dimensionality.” Consequently, various dimension reduction (DR) techniques are being actively researched to address these challenges. In particular, applying DR techniques to high-dimensional datasets—such as those encountered in digital imaging, natural language processing, and genomics, often comprising hundreds of or thousands of variable—has gained considerable attention as a means to overcome the curse of dimensionality. This study proposes a novel DR technique utilizing a supervised autoencoder model for binary classification scenarios where the response variable is binary. The proposed method maps high-dimensional data into a lower-dimensional latent space learned by the autoencoder. Subsequently, it employs the Maximum Mean Discrepancy (MMD) loss function to enhance the linear separability between distinct classes within this latent representation. During the autoencoder’s training, the MMD loss encourages samples from the same class to group closely together while simultaneously maximizing the distance between samples from different classes. Recognizing the critical importance of classification performance (e.g., distinguishing between defective and non-defective items) following dimension reduction in high-dimensional data with binary response variables, we conducted a comparative evaluation using seven distinct high-dimensional datasets. Experimental results demonstrate that the proposed model achieves superior performance compared to other established DR techniques, in terms of classification accuracy and F1-score.
KW - Autoencoder
KW - Dimension reduction
KW - High dimension
KW - Maximum mean discrepancy loss function
UR - https://www.scopus.com/pages/publications/105012184202
U2 - 10.1007/s42952-025-00338-y
DO - 10.1007/s42952-025-00338-y
M3 - Article
AN - SCOPUS:105012184202
SN - 1226-3192
VL - 54
SP - 1269
EP - 1295
JO - Journal of the Korean Statistical Society
JF - Journal of the Korean Statistical Society
IS - 4
ER -