Exploring Kolmogorov–Arnold Network Expansions in Vision Transformers for Mitigation of Catastrophic Forgetting in Continual Learning

Research output: Contribution to journalArticlepeer-review

Abstract

Continual Learning (CL), the ability of a model to learn new tasks without forgetting previously acquired knowledge, remains a critical challenge in artificial intelligence. This is particularly true for Vision Transformers (ViTs) that utilize Multilayer Perceptrons (MLPs) for global representation learning. Catastrophic forgetting, where new information overwrites prior knowledge, is especially problematic in these models. This research proposes the replacement of MLPs in ViTs with Kolmogorov–Arnold Networks (KANs) to address this issue. KANs leverage local plasticity through spline-based activations, ensuring that only a subset of parameters is updated per sample, thereby preserving previously learned knowledge. This study investigates the efficacy of KAN-based ViTs in CL scenarios across various benchmark datasets (MNIST, CIFAR100, and TinyImageNet-200), focusing on this approach’s ability to retain accuracy on earlier tasks while adapting to new ones. Our experimental results demonstrate that KAN-based ViTs significantly mitigate catastrophic forgetting, outperforming traditional MLP-based ViTs in both knowledge retention and task adaptation. This novel integration of KANs into ViTs represents a promising step toward more robust and adaptable models for dynamic environments.

Original languageEnglish
Article number2988
JournalMathematics
Volume13
Issue number18
DOIs
StatePublished - Sep 2025

Keywords

  • catastrophic forgetting
  • continual learning
  • deep learning
  • Kolmogorov–Arnold network
  • Vision Transformers

Fingerprint

Dive into the research topics of 'Exploring Kolmogorov–Arnold Network Expansions in Vision Transformers for Mitigation of Catastrophic Forgetting in Continual Learning'. Together they form a unique fingerprint.

Cite this