Using genetic algorithms to optimize nearest neighbors for data mining

Hyunchul Ahn, Kyoung Jae Kim

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Case-based reasoning (CBR) is widely used in data mining for managerial applications because it often shows significant promise for improving the effectiveness of complex and unstructured decision making. There are, however, some limitations in designing appropriate case indexing and retrieval mechanisms including feature selection and feature weighting. Some of the prior studies pointed out that finding the optimal k parameter for the k-nearest neighbor (k-NN) is also one of the most important factors for designing an effective CBR system. Nonetheless, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. This study proposes a genetic algorithm (GA) approach to optimize the number of neighbors to combine. In this study, we apply this novel model to two real-world cases involving stock market and online purchase prediction problems. Experimental results show that a GA-optimized k-NN approach may outperform traditional k-NN. In addition, these results also show that our proposed method is as good as or sometime better than other AI techniques in performance-comparison.

Original languageEnglish
Pages (from-to)5-18
Number of pages14
JournalAnnals of Operations Research
Volume163
Issue number1
DOIs
StatePublished - Oct 2008

Keywords

  • Case-based reasoning
  • Genetic algorithms
  • Number of neighbors to combine
  • Purchase prediction
  • Stock market prediction

Fingerprint

Dive into the research topics of 'Using genetic algorithms to optimize nearest neighbors for data mining'. Together they form a unique fingerprint.

Cite this