Abstract
In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC)posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009)is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999).
Original language | English |
---|---|
Pages (from-to) | 207-220 |
Number of pages | 14 |
Journal | Journal of the Korean Statistical Society |
Volume | 48 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2019 |
Keywords
- Adjusted Rand index
- Dirichlet process mixture
- Fourier series
- Label-switching
- Temporal cyclic gene expression profiles
- Variable selection