Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data

Ju Hyun Park, Minjung Kyung

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In the field of molecular biology, it is often of interest to analyze microarray data for clustering genes based on similar profiles of gene expression to identify genes that are differentially expressed under multiple biological conditions. One of the notable characteristics of a gene expression profile is that it shows a cyclic curve over a course of time. To group sequences of similar molecular functions, we propose a Bayesian Dirichlet process mixture of linear regression models with a Fourier series for the regression coefficients, for each of which a spike and slab prior is assumed. A full Gibbs-sampling algorithm is developed for an efficient Markov chain Monte Carlo (MCMC)posterior computation. Due to the so-called “label-switching” problem and different numbers of clusters during the MCMC computation, a post-process approach of Fritsch and Ickstadt (2009)is additionally applied to MCMC samples for an optimal single clustering estimate by maximizing the posterior expected adjusted Rand index with the posterior probabilities of two observations being clustered together. The proposed method is illustrated with two simulated data and one real data of the physiological response of fibroblasts to serum of Iyer et al. (1999).

Original languageEnglish
Pages (from-to)207-220
Number of pages14
JournalJournal of the Korean Statistical Society
Volume48
Issue number2
DOIs
StatePublished - Jun 2019

Keywords

  • Adjusted Rand index
  • Dirichlet process mixture
  • Fourier series
  • Label-switching
  • Temporal cyclic gene expression profiles
  • Variable selection

Fingerprint

Dive into the research topics of 'Bayesian curve fitting and clustering with Dirichlet process mixture models for microarray data'. Together they form a unique fingerprint.

Cite this