On Friday, April 28, 2017, in the CNSI Auditorium, Eleazar Eskin presented ZarLab’s research on fine mapping causal variants and allelic heterogeneity at the 2nd Annual Institute for Quantitative and Computational Biosciences (QCBio) Symposium.
Geneticists use a technique called Genome Wide Association Studies (GWAS) to identify genetic variants that cause an individual to exhibit a particular trait or disease. Typically, GWAS identifies an association signal which suggests that genetic variants within a region of the genome — known as a locus — are associated with the condition. The process of identifying the actual variant in the region which has an affect on the disease is referred to as “fine mapping.”
In addition to finding the actual variants affecting a disease, fine mapping also seeks to address questions that are related to the genetic basis of disease. First, how many causal variants does a locus contain? A disease could be caused by one, single variant or multiple variants that independently affect disease status. We refer to the latter phenomenon as allelic heterogeneity (AH).
Second, when analyzing results from multiple GWASes, can the same causal variant identified in one study be assumed causal in other studies? A GWAS can identify many variants that are associated with two or more traits; however, this correlation can be induced by a confounding factor known as linkage disequilibrium. Colocalization methods seek to identify shared and distinct causal variants.
Farhad Hormozdiari, a recent alumnus of our group and a post-doc at Harvard University, developed several novel approaches for improving the accuracy and efficiency of fine mapping despite presence of AH in the study population. Hormozdiari’s software, CAVIAR, CAVIAR-Genes, and eCAVIAR, are capable of quantifying the probability of a variant to be causal in GWAS and eQTL studies, while allowing for an arbitrary number of causal variants.
More details about our research in fine mapping are available in the following papers:
Hormozdiari, Farhad; van de Bunt, Martijn; Segrè, Ayellet V; Li, Xiao; Joo, Jong Wha J; Bilow, Michael; Sul, Jae Hoon; Sankararaman, Sriram; Pasaniuc, Bogdan; Eskin, Eleazar Colocalization of GWAS and eQTL Signals Detects Target Genes. Journal Article In: Am J Hum Genet, 2016, ISSN: 1537-6605. Abstract | Links | BibTeX @article{Hormozdiari:AmJHumGenet:2016b,
title = {Colocalization of GWAS and eQTL Signals Detects Target Genes.},
author = { Farhad Hormozdiari and Martijn van de Bunt and Ayellet V. Segrè and Xiao Li and Jong Wha J. Joo and Michael Bilow and Jae Hoon Sul and Sriram Sankararaman and Bogdan Pasaniuc and Eleazar Eskin},
url = {http:://dx.doi.org/10.1016/j.ajhg.2016.10.003},
issn = {1537-6605},
year = {2016},
date = {2016-01-01},
journal = {Am J Hum Genet},
address = {United States},
organization = {Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.},
abstract = {The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual's disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci |
Hormozdiari, Farhad; Kichaev, Gleb; Yang, Wen-Yun Y; Pasaniuc, Bogdan; Eskin, Eleazar Identification of causal genes for complex traits. Journal Article In: Bioinformatics, 31 (12), pp. i206-i213, 2015, ISSN: 1367-4811. Abstract | Links | BibTeX @article{Hormozdiari:Bioinformatics:2015b,
title = {Identification of causal genes for complex traits.},
author = { Farhad Hormozdiari and Gleb Kichaev and Wen-Yun Y. Yang and Bogdan Pasaniuc and Eleazar Eskin},
url = {http://dx.doi.org/10.1093/bioinformatics/btv240},
issn = {1367-4811},
year = {2015},
date = {2015-01-01},
journal = {Bioinformatics},
volume = {31},
number = {12},
pages = {i206-i213},
address = {England},
abstract = {MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
MOTIVATION: Although genome-wide association studies (GWAS) have identified thousands of variants associated with common diseases and complex traits, only a handful of these variants are validated to be causal. We consider 'causal variants' as variants which are responsible for the association signal at a locus. As opposed to association studies that benefit from linkage disequilibrium (LD), the main challenge in identifying causal variants at associated loci lies in distinguishing among the many closely correlated variants due to LD. This is particularly important for model organisms such as inbred mice, where LD extends much further than in human populations, resulting in large stretches of the genome with significantly associated variants. Furthermore, these model organisms are highly structured and require correction for population structure to remove potential spurious associations. RESULTS: In this work, we propose CAVIAR-Gene (CAusal Variants Identification in Associated Regions), a novel method that is able to operate across large LD regions of the genome while also correcting for population structure. A key feature of our approach is that it provides as output a minimally sized set of genes that captures the genes which harbor causal variants with probability $rho$. Through extensive simulations, we demonstrate that our method not only speeds up computation, but also have an average of 10% higher recall rate compared with the existing approaches. We validate our method using a real mouse high-density lipoprotein data (HDL) and show that CAVIAR-Gene is able to identify Apoa2 (a gene known to harbor causal variants for HDL), while reducing the number of genes that need to be tested for functionality by a factor of 2. AVAILABILITY AND IMPLEMENTATION: Software is freely available for download at genetics.cs.ucla.edu/caviar. CONTACT: eeskin@cs.ucla.edu |
Hormozdiari, Farhad; Kostem, Emrah ; Kang, Eun Yong ; Pasaniuc, Bogdan ; Eskin, Eleazar Identifying causal variants at Loci with multiple signals of association. Journal Article In: Genetics, 198 (2), pp. 497-508, 2014, ISSN: 1943-2631. Abstract | Links | BibTeX @article{Hormozdiari:Genetics:2014,
title = {Identifying causal variants at Loci with multiple signals of association.},
author = { Farhad Hormozdiari and Emrah Kostem and Eun Yong Kang and Bogdan Pasaniuc and Eleazar Eskin},
url = {http://dx.doi.org/10.1534/genetics.114.167908},
issn = {1943-2631},
year = {2014},
date = {2014-01-01},
journal = {Genetics},
volume = {198},
number = {2},
pages = {497-508},
address = {United States},
abstract = {Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20-50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/ |
Hormozdiari F, Zhu A, Kichaev G, Ju CJ, Segrè AV, Joo JW, Won H, Sankararaman S, Pasaniuc B, Shifman S, Eskin E. Widespread allelic heterogeneity in complex traits. The American Journal of Human Genetics. 2017 May 4;100(5):789-802.