ADSTREAM: Anomaly detection in large-scale data streams using local outlier factor based on micro-cluster

Sanghyun Seo, Seongchul Park, Injea Hwang, Juntae Kim

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Micro-cluster based clustering methods perform efficient clustering for the large-scale data stream by using two components which are online phase and offline phase. An online component creates micro-clusters for input data stream and an offline component performs final clustering based on a formed micro-cluster from online component. However, since these methods are passive for anomaly detection, there are disadvantages in that outliers are not specified. Most existing methodologies first cluster all data and then set the data not clustered as outliers. Although the typical micro-cluster based data stream clustering methods are excellent in clustering quality, these methodologies are not suitable for anomaly detection which should clarify what data is outliers. In this paper, we propose ADSTREAM using a Local Outlier Factor for center of micro-clusters in the offline component for detecting and specifying outliers. In the experiment, we visualize the anomaly detection results of ADSTREAM and perform micro-cluster based anomaly detections on the large-scale streams of the KDDCUP1999 dataset and show that the performance of anomaly detection performed by ADSTREAM is improved dramatically compared to the existing micro-cluster based clustering methods. As a result, ADSTREAM is able to efficiently perform anomaly detection while preserving the advantages of existing data stream clustering algorithms for real-time large-scale streams.

Original languageEnglish
Pages (from-to)10204-10209
Number of pages6
JournalAdvanced Science Letters
Volume23
Issue number10
DOIs
StatePublished - Oct 2017

Keywords

  • Anomaly detection
  • Large-scale data stream
  • Local outlier factor
  • Micro cluster

Fingerprint

Dive into the research topics of 'ADSTREAM: Anomaly detection in large-scale data streams using local outlier factor based on micro-cluster'. Together they form a unique fingerprint.

Cite this