Identifying gene clusters within localized regions in multiple genomes

Qingwu Yang, Gangman Yi, Fenghui Zhang, Michael R. Thon, Sing Hoi Sze

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

An important strategy to study genome evolution is to investigate the clustering of orthologous genes among multiple genomes, in which the most popular approaches require that the distance between adjacent genes in a cluster be small. We investigate a different formulation based on constraining the overall size of a cluster and develop statistical significance estimates that allow direct comparison of clusters of different sizes. We first consider a restricted version which requires that orthologous genes are strictly ordered within each cluster and show that it can be solved in polynomial time. We then develop practical exact algorithms for the unrestricted problem that allows paralogous genes within a genome and clusters that may not appear in every genome while considering a general model in which a gene is allowed to appear in more than one orthologous group. We show that our algorithm can identify biologically relevant gene clusters on four bacterial genomes Bacillus subtilis, Streptococcus pyogenes, Streptococcus pneumoniae, and Clostridium acetobutylicum. We also show that our algorithm can identify significantly more functionally enriched gene clusters on four yeast genomes Saccharomyces cerevisiae, Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus than previous algorithms. A software program (GCFinder) and a list of gene clusters found on the bacterial and the yeast genomes are available at http://faculty.cse.tamu.edu/shsze/gcfinder.

Original languageEnglish
Pages (from-to)657-668
Number of pages12
JournalJournal of Computational Biology
Volume17
Issue number5
DOIs
StatePublished - 1 May 2010

Keywords

  • Gene clusters
  • NP-hardness
  • Ordered clusters
  • Unordered clusters

Fingerprint

Dive into the research topics of 'Identifying gene clusters within localized regions in multiple genomes'. Together they form a unique fingerprint.

Cite this