Problem
Studies of complex traits often have small sample sizes. There are some methods to address this, such as overlapping analysis of eQTLs and GWAS trait variants, but these may miss small effect size expression.
TWAS
Concept
First, check that $h^2_{cis} \neq 0$ is significant. Then we use true expression data to train an imputed expression model. There are three imputed expression models, using cis-eQTL and BLUP or BSLMM, respectively. We compare their $\frac{r^2}{h^2}$, and BSLMM is the best one. We impute expression-trait association statistics from GWAS summary statistics and the imputed expression model.
Benefit
Gene expression data is not required in TWAS.
Limitation
- We assume that SNPs affect traits through gene expression.
- TWAS can’t distinguish causality; how to solve this? Add a trait term to the linear model. If the imputed expression becomes not significant, it means that there is a phenotype-mediated effect (SNP → trait → expression).
Common Technology
Omnibus Test
The Omnibus Test uses summary data to deal with multiple cohorts/methods. In this paper, we use the omnibus test to check for significant associations across predictions from YFS, METSIM, and NTR (different tissues). For gene $i$
$$ \begin{align*} \text{omnibus}_i = \mathbf{Z_i^T C_i^{-1} Z_i} \overset{approx}{\sim} \chi^2_3 \end{align*} $$where
- $\mathbf{Z_i}$ is $3 \times 1$ vector, representing $3$ cohort TWAS Z score
- $\mathbf{C_i}$ is $3 \times 3$ correlation matrix for $3$ cohort
Permutation Test
Permutation test doesn’t need distribution assumption. It’s a nonparameter method and testing multiple group data is significant different. In this paper. we shuffle expression-trait association 1,000 times for each TWAS gene, plot the distribution of shuffled Z score $Z_{perm}$ which follows $\sim N(0, \Sigma_{s,s})$) . We compute p-value
$$ \begin{align*} \text{p-value} = \frac{\displaystyle \sum_i^{1000}I(Z_{obs} < Z_{perm,i})}{1000} \end{align*} $$If p-value$<0.05$, we reject null hypothesis (expression $\perp$ trait).
Performance
True Data
TWAS Identify 25 novel expression-trait associations using summary association statistics from a 2010 lipid GWAS.
Simulation
Under null
We simulate expression from two null expression models. For expression $\perp$ SNP, cis-heritable trait model
$$Z-score \sim N\left(0,\mathbf{\frac{WZ}{(W\Sigma_{s,s} W')^{1/2}}}\right) ,\ \text{expression} \sim N(0,1)$$For trait $\perp$ SNP, cis-heritable expression model
$$ Z-score \sim N(0,1) ,\ \text{expression}=\sum_i X_i +\varepsilon$$where
- $\mathbf{W=\Sigma_{e,s}\Sigma^{-1}_{s,s}}$
- $\mathbf{\Sigma_{e,s}}:$ covariance between SNPs and expression
- $\mathbf{\Sigma_{s,s}}:$ covariance among all SNPs
Under alternative
We use $6000$ unrelated METSIM GWAS samples, $100$ genes and the SNPs in the surrounding 1MB. For $100$ genes, expression simulated as
$$ \begin{align*} \mathbf{E}=\mathbf{X {\beta} + \varepsilon},\ \text{where } \varepsilon,\ \beta \text{ from Normal} \quad (1) \end{align*} $$to achieve $h^2_{cis-g}=0.17$. $1000$ samples with SNPs and simulated expression were then withheld for training $(1)$. And we use $(1)$ to simulate remaining $5000$ samples expression. For remaining $5000$ samples, phenotype $Y$ simulated as
$$ \begin{align*} Y=E \alpha'+\varepsilon \quad (2) \end{align*} $$So that $h^2_E=\frac{0.1}{180}$ or $\frac{0.2}{180}$. Repeating $5000$ samples expression simulation $(1)$ and phenotype simulation $(2)$ $60$ times with different $\varepsilon$. After computing Z-score between snp, phenotype, we simulate $5000 \times 60$ size GWAS.
