Abstract
Micro-cluster based clustering methods perform efficient clustering for the large-scale data stream by using two components which are online phase and offline phase. An online component creates micro-clusters for input data stream and an offline component performs final clustering based on a formed micro-cluster from online component. However, since these methods are passive for anomaly detection, there are disadvantages in that outliers are not specified. Most existing methodologies first cluster all data and then set the data not clustered as outliers. Although the typical micro-cluster based data stream clustering methods are excellent in clustering quality, these methodologies are not suitable for anomaly detection which should clarify what data is outliers. In this paper, we propose ADSTREAM using a Local Outlier Factor for center of micro-clusters in the offline component for detecting and specifying outliers. In the experiment, we visualize the anomaly detection results of ADSTREAM and perform micro-cluster based anomaly detections on the large-scale streams of the KDDCUP1999 dataset and show that the performance of anomaly detection performed by ADSTREAM is improved dramatically compared to the existing micro-cluster based clustering methods. As a result, ADSTREAM is able to efficiently perform anomaly detection while preserving the advantages of existing data stream clustering algorithms for real-time large-scale streams.
Original language | English |
---|---|
Pages (from-to) | 10204-10209 |
Number of pages | 6 |
Journal | Advanced Science Letters |
Volume | 23 |
Issue number | 10 |
DOIs | |
State | Published - Oct 2017 |
Keywords
- Anomaly detection
- Large-scale data stream
- Local outlier factor
- Micro cluster