TY - JOUR
T1 - Gaussian kernel with correlated variables for incomplete data
AU - Choi, Jeongsub
AU - Son, Youngdoo
AU - Jeong, Myong K.
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.
PY - 2024/10
Y1 - 2024/10
N2 - The presence of missing components in incomplete instances precludes a kernel-based model from incorporating partially observed components of incomplete instances and computing kernels, including Gaussian kernels that are extensively used in machine learning modeling and applications. Existing methods with Gaussian kernels to handle incomplete data, however, are based on independence among variables. In this study, we propose a new method, the expected Gaussian kernel with correlated variables, that estimates the Gaussian kernel with incomplete data, by considering the correlation among variables. In the proposed method, the squared distance between two instance vectors is modeled with the sum of the correlated squared unit-dimensional distances between the instances, and the Gaussian kernel with missing values is obtained by estimating the expected Gaussian kernel function under the probability distribution for the squared distance between the vectors. The proposed method is evaluated on synthetic data and real-life data from benchmarks and a case from a multi-pattern photolithographic process for wafer fabrication in semiconductor manufacturing. The experimental results show the improvement by the proposed method in the estimation of Gaussian kernels with incomplete data of correlated variables.
AB - The presence of missing components in incomplete instances precludes a kernel-based model from incorporating partially observed components of incomplete instances and computing kernels, including Gaussian kernels that are extensively used in machine learning modeling and applications. Existing methods with Gaussian kernels to handle incomplete data, however, are based on independence among variables. In this study, we propose a new method, the expected Gaussian kernel with correlated variables, that estimates the Gaussian kernel with incomplete data, by considering the correlation among variables. In the proposed method, the squared distance between two instance vectors is modeled with the sum of the correlated squared unit-dimensional distances between the instances, and the Gaussian kernel with missing values is obtained by estimating the expected Gaussian kernel function under the probability distribution for the squared distance between the vectors. The proposed method is evaluated on synthetic data and real-life data from benchmarks and a case from a multi-pattern photolithographic process for wafer fabrication in semiconductor manufacturing. The experimental results show the improvement by the proposed method in the estimation of Gaussian kernels with incomplete data of correlated variables.
KW - Gamma approximation
KW - Gaussian kernel
KW - Incomplete data
KW - Semiconductor manufacturing
UR - http://www.scopus.com/inward/record.url?scp=85177828290&partnerID=8YFLogxK
U2 - 10.1007/s10479-023-05656-0
DO - 10.1007/s10479-023-05656-0
M3 - Article
AN - SCOPUS:85177828290
SN - 0254-5330
VL - 341
SP - 223
EP - 244
JO - Annals of Operations Research
JF - Annals of Operations Research
IS - 1
ER -