Abstract
This paper addresses the problem of identifying groups that satisfy the specific conditions for the means of feature variables. In this study, we refer to the identified groups as "target clusters" (TCs). To identify TCs, we propose a method based on the normal mixture model (NMM) restricted by a linear combination of means. We provide an expectation-maximization (EM) algorithm to fit the restricted NMM by using the maximum-likelihood method. The convergence property of the EM algorithm and a reasonable set of initial estimates are presented. We demonstrate the method's usefulness and validity through a simulation study and two well-known data sets. The proposed method provides several types of useful clusters, which would be difficult to achieve with conventional clustering or exploratory data analysis methods based on the ordinary NMM. A simple comparison with another target clustering approach shows that the proposed method is promising in the identification.
Original language | English |
---|---|
Pages (from-to) | 941-960 |
Number of pages | 20 |
Journal | Journal of Applied Statistics |
Volume | 40 |
Issue number | 5 |
DOIs | |
State | Published - May 2013 |
Keywords
- EM algorithm
- maximum-likelihood method
- mean restrictions
- microarray gene expression data
- restricted normal mixture model
- target clustering