Featured image of post TWAS

TWAS

Problem

Studies of complex traits often have small sample sizes. There are some methods to address this, such as overlapping analysis of eQTLs and GWAS trait variants, but these may miss small effect size expression.

TWAS

Concept

First, check that $h^2_{cis} \neq 0$ is significant. Then we use true expression data to train an imputed expression model. There are three imputed expression models, using cis-eQTL and BLUP or BSLMM, respectively. We compare their $\frac{r^2}{h^2}$, and BSLMM is the best one. We impute expression-trait association statistics from GWAS summary statistics and the imputed expression model.

Benefit

Gene expression data is not required in TWAS.

Limitation

  1. We assume that SNPs affect traits through gene expression.
  2. TWAS can’t distinguish causality; how to solve this? Add a trait term to the linear model. If the imputed expression becomes not significant, it means that there is a phenotype-mediated effect (SNP → trait → expression).

Common Technology

Omnibus Test

The Omnibus Test uses summary data to deal with multiple cohorts/methods. In this paper, we use the omnibus test to check for significant associations across predictions from YFS, METSIM, and NTR (different tissues). For gene $i$

$$ \begin{align*} \text{omnibus}_i = \mathbf{Z_i^T C_i^{-1} Z_i} \overset{approx}{\sim} \chi^2_3 \end{align*} $$

where

  • $\mathbf{Z_i}$ is $3 \times 1$ vector, representing $3$ cohort TWAS Z score
  • $\mathbf{C_i}$ is $3 \times 3$ correlation matrix for $3$ cohort

Permutation Test

Permutation test doesn’t need distribution assumption. It’s a nonparameter method and testing multiple group data is significant different. In this paper. we shuffle expression-trait association 1,000 times for each TWAS gene, plot the distribution of shuffled Z score $Z_{perm}$ which follows $\sim N(0, \Sigma_{s,s})$) . We compute p-value

$$ \begin{align*} \text{p-value} = \frac{\displaystyle \sum_i^{1000}I(Z_{obs} < Z_{perm,i})}{1000} \end{align*} $$

If p-value$<0.05$, we reject null hypothesis (expression $\perp$ trait).

Performance

True Data

TWAS Identify 25 novel expression-trait associations using summary association statistics from a 2010 lipid GWAS.

Simulation

Under null

We simulate expression from two null expression models. For expression $\perp$ SNP, cis-heritable trait model

$$Z-score \sim N\left(0,\mathbf{\frac{WZ}{(W\Sigma_{s,s} W')^{1/2}}}\right) ,\ \text{expression} \sim N(0,1)$$

For trait $\perp$ SNP, cis-heritable expression model

$$ Z-score \sim N(0,1) ,\ \text{expression}=\sum_i X_i +\varepsilon$$

where

  • $\mathbf{W=\Sigma_{e,s}\Sigma^{-1}_{s,s}}$
  • $\mathbf{\Sigma_{e,s}}:$ covariance between SNPs and expression
  • $\mathbf{\Sigma_{s,s}}:$ covariance among all SNPs

Under alternative

We use $6000$ unrelated METSIM GWAS samples, $100$ genes and the SNPs in the surrounding 1MB. For $100$ genes, expression simulated as

$$ \begin{align*} \mathbf{E}=\mathbf{X {\beta} + \varepsilon},\ \text{where } \varepsilon,\ \beta \text{ from Normal} \quad (1) \end{align*} $$

to achieve $h^2_{cis-g}=0.17$. $1000$ samples with SNPs and simulated expression were then withheld for training $(1)$. And we use $(1)$ to simulate remaining $5000$ samples expression. For remaining $5000$ samples, phenotype $Y$ simulated as

$$ \begin{align*} Y=E \alpha'+\varepsilon \quad (2) \end{align*} $$

So that $h^2_E=\frac{0.1}{180}$ or $\frac{0.2}{180}$. Repeating $5000$ samples expression simulation $(1)$ and phenotype simulation $(2)$ $60$ times with different $\varepsilon$. After computing Z-score between snp, phenotype, we simulate $5000 \times 60$ size GWAS.

Reference

Licensed under CC BY-NC-SA 4.0
comments powered by Disqus
使用 Hugo 建立
主題 StackJimmy 設計