Meta Analysis

Introduction

Material and Method

flowchart

Data

We utilized data from the 1,000 Genomes Project to perform a GWAS for height. The study encompassed chromosomes 1 through 22, analyzing a total of 36,820,992 variants across 1,092 individuals.

Genotype QC

Excluding the SNP or individual with missing rate $> 0.1$ : 36820992 variants and 1092 people pass filter
Excluding the SNP with MAF $\leq 0.05$ : 6797981 variants and 1092 people pass filter
Excluding the SNP with HWE $< 0.0001$ i.e. pvalue $< 0.0001$ : 4941621 variants and 1092 people pass filter
Excluding the SNP with $r^2 < 0.2$ in 500 window bp to PCA : 299901 variants and 1092 people pass filter
flip beta to -beta

Fix Effect

Random Effect

Result

SNP Finding

LD

Manhattan Plot

Code

Introduction

$$(\hat{\beta}_i, \sigma_i), \ i=1,\ldots,N$$

where

$\sigma_i$ is standard error of $\hat{\beta}_i$

$\widetilde{\beta} = \frac{\sum_{i=1}^N \hat{\beta}_i \sigma_i^{-2}}{\sum_{i=1}^N \sigma_i^{-2}}$ is a common weight called inverse variance weight. If $\widetilde{\beta} \sim {N}(\beta, \sigma_i^2)$ and independent

$$\begin{align*} Var(\widetilde{\beta}) &= {Var}\left( \frac{\sum_{i=1}^N \hat{\beta}_i \sigma_i^{-2}}{\sum_{i=1}^N \sigma_i^{-2}} \right) \\ &= \frac{\sum_{i=1}^N {Var}\left( \hat{\beta}_i \sigma_i^{-2} \right)}{\left( \sum_{i=1}^N \sigma_i^{-2} \right)^2} \\ &= \frac{\sum_{i=1}^N \sigma_i^{-4} \sigma_i^2}{\left( \sum_{i=1}^N \sigma_i^{-2} \right)^2} \\ &= \frac{\sum_{i=1}^N \sigma_i^{-2}}{\left( \sum_{i=1}^N \sigma_i^{-2} \right)^2} \\ &= \frac{1}{\sum_{i=1}^N \sigma_i^{-2}} \end{align*}$$

GWAS Model

$$y = \beta_0 + \beta(x) + \gamma z_1 + \gamma z_2 + \cdots + \gamma z_k + \epsilon$$

Under logistic regression,

$$y \sim \text{Ber}(p), \ y \in \{0, 1\}$$$$\begin{align*} \text{logit}(p) &= \log\left(\frac{p}{1-p}\right) \\ &= \beta_0 + \beta x + \gamma z_1 + \gamma z_2 + \cdots + \gamma z_k + \epsilon \end{align*}$$$$\begin{align*} \beta &= \log\left( \frac{p(x=1)}{1-p(x=1)} \right) - \log\left( \frac{p(x=0)}{1-p(x=0)} \right) \\ &= \log\left( \frac{p(x=1)}{1-p(x=1)} \middle/ \frac{p(x=0)}{1-p(x=0)} \right) \end{align*}$$

Chi-square Test for Heterogeneity in Effect

$$\begin{align*} Q = \sum_{i=1}^N \left( \frac{\hat{\beta}_i - \widetilde{\beta}}{\sigma_i} \right)^2 \sim \chi^2_{N-1} \end{align*}$$

test is there any $\beta_i$ sig. different

$$\begin{align*} I^2 &= 100\% \cdot \frac{Q - \text{df}}{Q}\\ \end{align*}$$

$I^2 = 0-25\%$: Low heterogeneity, then Heterogeneity is small $(\beta_1 = \beta_2 = \cdots = \beta_N)$. Not reject $H_0$
$I^2 = 25-50\%:$ Moderate
$I^2 = 50-75\%:$ Substantial
$I^2 >75\%:$ Considerable, then Heterogeneity is large. Reject $H_0$

where

$Q = \sum_{i=1}^N \left( \frac{\hat{\beta}_i - \widetilde{\beta}}{\sigma_i} \right)^2 \sim \chi^2_{N-1}$
df = $N-1$

Cochran’s Q test

it might be underpowered when few studies have been included or when event rates are low. Therefore, it is often recommended to adopt a higher P-value (rather than 0.05) as a threshold for statistical significance when using Cochran’s Q test to determine statistical heterogeneity.

$$Q = \sum_{i=1}^N \left( \frac{\hat{\beta}_i - \widetilde{\beta}}{\sigma_i} \right)^2 \sim \chi^2_{N-1}$$

Under large sample, if p-value $P(\chi^2_{N-1}>Q)<0.05$, reject $H_0$
Under small sample, if p-value $P(\chi^2_{N-1}>Q)<0.1$, reject $H_0$
reference web https://www.ncbi.nlm.nih.gov/books/NBK53317/table/ch3.t2/#:~:text=Cochran's%20Q%20test%20is%20the,within%20subjects%20within%20a%20study.

Heterogeneity in Effect

The genetic influence on a trait varies across different individuals or populations, even when the trait looks the same. May arise from

Differences in LD structure
Interactions with environmental or other genetic exposures at different frequencies

Fix Effect Meta-Analysis

$$\begin{align*} (\hat{\beta}_i, \sigma^2_{i}),\quad i = 1, \ldots, N,\quad N \text{ studies} \end{align*}$$

where

$\hat{\beta_i}$ is effect size
$\sigma_i^2$ is variance

$$\begin{align*} \hat{\beta}_i \sim N(\beta, \sigma^2_{i}) \\ \tilde{\beta} = \frac{ \sum_{i=1}^N \hat{\beta}_i \sigma^{-2}_{i} }{ \sum_{i=1}^N \sigma^{-2}_{i} } \end{align*}$$

Random Effect Meta-Analysis

$$\begin{align*} &\hat{\beta}_i \sim {N}(\beta_i, \sigma_i^2) , \quad \beta_i \sim {N}(\mu, \tau^2) \end{align*}$$

where

$\sigma^2_i$ is sampling variation within study (研究內的抽樣誤差)
$\tau^2$ is variance between studies (研究之間的不同)

$$\begin{align*} \text{Var}(\hat{\beta}_i) &= E(\text{Var}(\hat{\beta}_i | \beta_i)) + \text{Var}(E(\hat{\beta}_i | \beta_i)) \\ &= E(\sigma_i^2) + \text{Var}(\beta_i) \\ &= \sigma_i^2 + \tau^2 \end{align*}$$$$\begin{align*} \Rightarrow \hat{\beta}_i \sim {N}(\mu, \sigma^2 + {\tau}^2) \end{align*}$$

Probability Distributions

$$\begin{align*} P(\hat{\beta}_i | \mu, \tau) &\propto \int P(\hat{\beta}_i | \beta_i) \, d\beta_i \\ &\propto \int P(\hat{\beta}_i | \beta_i) P(\beta_i | \mu, \tau) \, d\beta_i \\ &\propto \int \exp\left\{ -\frac{1}{2\sigma_i^2} (\hat{\beta}_i - \beta_i)^2 - \frac{1}{2\tau^2} (\beta_i - \mu)^2 \right\} \, d\beta_i \end{align*}$$

$P(\hat{\beta}_i ,\beta_i)$ is bivariate normal distribution, and the marginal distribution is still normal dist.

Genomic Control

$$\begin{align*} \lambda_{\text{GC}} &= \frac{\text{median}(\chi^2_{\text{observed}})}{\text{median}(\chi^2_{\text{adjusted}})} = \frac{\text{median}(\chi^2_{\text{observed}})}{0.455} \\ &\lambda_{\text{GC}} \begin{cases} \approx 1: & \text{well-calibrated} \\ > 1: & \text{inflative} \\ < 1: & \text{conservative test} \end{cases} \end{align*}$$

where $\chi^2_{\text{adjusted}} = \frac{\chi^2_{\text{observed}}}{\lambda_{\text{GC}}}$

Allelic Chi-square Test

假設有一個 SNP，兩個等位基因：A, a，出現在case, control 的次數是

$$ \begin{array}{c|cc} & A & a \\ \hline \text{Case} & O_{1A} & O_{1a} \\ \text{Control} & O_{0A} & O_{0a} \\ \end{array} $$

where $N = O_{1A}+O_{1a}+O_{0A}+O_{0a}$

$$ E_{1A} = \frac{(O_{1A}+O_{0A})(O_{1A}+O_{1a})}{N} $$

其餘類似，把期望值算出

$$ E_{1a}, E_{0A}, E_{0a} $$

接著計算每個snp的 chi-square statistic

$$ \chi^2 = \sum_{i} \frac{(O_i - E_i)^2}{E_i} $$

Genomic Control 不是重新計算 $\chi^2$，而是認為資料的$\chi^2$ 偏高，需要除上 $\lambda_{\text{GC}}$ 矯正

$$ \chi^2_{\text{GC}} = \frac{\chi^2}{\lambda_{\text{GC}}} $$

如何計算 λGC?

取所有 SNP 的$\chi^2$中位數，除以理論中位數（df=1 時）。

$$ \text{median}(\chi^2_{df=1}) = 0.455 $$

矯正每個snp 計算的統計量

$$ \chi^2_{\text{GC}, i} = \frac{\chi^2_i}{\lambda_{\text{GC}}} = \frac{\text{median}(\chi^2_{\text{all SNPs}})}{0.455} $$

再換 p-value

$$ p_i^{\text{GC}} = 1 - F_{\chi^2_{df=1}}(\chi^2_{\text{GC}, i}) $$

Genomic Control 不適合 polygenic traits

Next Steps

Following the GWAS, several post-GWAS analyses can be conducted, including fine-mapping, functional annotation, and the calculation of polygenic risk scores (PRS). Furthermore, the GWAS catalog provides a vast repository of existing GWAS summary statistics. We can leverage this data to validate the significant SNPs identified in our study.

Reference

1000G
Prof. lhchien Course