TY - JOUR
T1 - Exploring Kolmogorov–Arnold Network Expansions in Vision Transformers for Mitigation of Catastrophic Forgetting in Continual Learning
AU - Ullah, Zahid
AU - Kim, Jihie
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/9
Y1 - 2025/9
N2 - Continual Learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence. This is particularly true for Vision Transformers (ViTs) that utilize Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes the replacement of MLPs in ViTs with Kolmogorov–Arnold Networks (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. This study investigates the efficacy of KAN-based ViTs in CL scenarios across various benchmark datasets (MNIST, CIFAR100, and TinyImageNet-200), focusing on this approach’s ability to retain accuracy on earlier tasks while adapting to new ones. Our experimental results demonstrate that KAN-based ViTs significantly mitigate catastrophic forgetting, outperforming traditional MLP-based ViTs in both knowledge retention and task adaptation. This novel integration of KANs into ViTs represents a promising step toward more robust and adaptable models for dynamic environments.
AB - Continual Learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence. This is particularly true for Vision Transformers (ViTs) that utilize Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes the replacement of MLPs in ViTs with Kolmogorov–Arnold Networks (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. This study investigates the efficacy of KAN-based ViTs in CL scenarios across various benchmark datasets (MNIST, CIFAR100, and TinyImageNet-200), focusing on this approach’s ability to retain accuracy on earlier tasks while adapting to new ones. Our experimental results demonstrate that KAN-based ViTs significantly mitigate catastrophic forgetting, outperforming traditional MLP-based ViTs in both knowledge retention and task adaptation. This novel integration of KANs into ViTs represents a promising step toward more robust and adaptable models for dynamic environments.
KW - catastrophic forgetting
KW - continual learning
KW - deep learning
KW - Kolmogorov–Arnold network
KW - Vision Transformers
UR - https://www.scopus.com/pages/publications/105017254710
U2 - 10.3390/math13182988
DO - 10.3390/math13182988
M3 - Article
AN - SCOPUS:105017254710
SN - 2227-7390
VL - 13
JO - Mathematics
JF - Mathematics
IS - 18
M1 - 2988
ER -