Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure

Jong Wha J. Joo, Eun Yong Kang, Elin Org, Nick Furlotte, Brian Parks, Farhad Hormozdiari, Aldons J. Lusis, Eleazar Eskin

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

A typical genome-wide association study tests correlation between a single phenotype and each genotype one at a time. However, single-phenotype analysis might miss unmeasured aspects of complex biological networks. Analyzing many phenotypes simultaneously may increase the power to capture these unmeasured aspects and detect more variants. Several multivariate approaches aim to detect variants related to more than one phenotype, but these current approaches do not consider the effects of population structure. As a result, these approaches may result in a significant amount of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA for generalized analysis of molecular variance for mixed-model analysis, which is capable of simultaneously analyzing many phenotypes and correcting for population structure. In a simulated study using data implanted with true genetic effects, GAMMA accurately identifies these true effects without producing false positives induced by population structure. In simulations with this data, GAMMA is an improvement over other methods which either fail to detect true effects or produce many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mice and show that GAMMA identifies several variants that are likely to have true biological mechanisms.

Original languageEnglish
Pages (from-to)1379-1390
Number of pages12
JournalGenetics
Volume204
Issue number4
DOIs
StatePublished - Dec 2016

Keywords

  • Mixed models
  • Multivariate analysis
  • Population structure

Fingerprint

Dive into the research topics of 'Efficient and accurate multiple-phenotype regression method for high dimensional data considering population structure'. Together they form a unique fingerprint.

Cite this