Introduction
Heritability is the proportion of variation in a trait within a population that can be attributed to genetic differences. There are two type of definition and Nanow-sense Heritability is the common one.
Broad-sense Heritability
$$ \begin{align*} H^2&=\frac{V_G}{V_G+V_E} \\ &=\frac{V_G}{V_P} \end{align*} $$- $V_P$ is phenotype variation
- $V_G$ is genetic variation
- $V_E$ is environment
Nanow-sense Heritability
$$ \begin{align*} h^2=\frac{V_A}{V_P}, \quad \text{where} V_G=V_A+V_{NA} \end{align*} $$- $V_A$ is additive genetic variation
- $V_{NA}$ is non-addrive genetic variation
Example-Dominant Coding
In dominant coding, genotypes $CC$, $C T$ and $T T$ are coded as $1$, $1$ and $0$, respectively, if $C$ is the minor allele.
Example-Epistasis
Labrador coat color is determined by two genes with four genotypes: $BE$, $bE$, $Be$, $be$
- Color is black when genotype is $B-E-$
- Color is chocolate when genotype is $bbE-$
- Color is yellow when genotype is $--ee$

Note
Heritability refers to a specific population, not to individuals.
Heritability $\neq$ inheritance. For example, your brown hair may be inherited from your father, but the heritability of brown hair in the population may be low.
Heritability $\neq$ total genetic contribution. A low $h^2 = \frac{V_A}{V_P}$ does not necessarily mean that genetics plays a small role.
If $h^2$ is low, identifying associated genes might be less fruitful.
There are $3$ common types of heritability.
Family-Based Heritability $(h^2_{\text{family}})$
Family-based studies, often twin studies, estimate heritability by comparing monozygotic (MZ) twins and dizygotic (DZ) twins. Let $r_{MZ}$ be the phenotypic correlation for MZ twins and $r_{DZ}$ for DZ twins.
$$ \begin{align*} \left\lbrace \begin{array}{lll} r_{MZ} = A+C \\ r_{DZ} = \frac{A}{2}+C \end{array} \right. \end{align*} $$where
- $A$ is additive genetic effect
- $C$ is shared (common) environmental effect
We esitmate
$$ \begin{align*} h^2_{\text{family}} & =A=2(r_{MZ}-r_{DZ}) \\ C & = A-r_{MZ} \end{align*} $$error $E = 1-C$ and $A+C+E=1$
SNP-Based Heritability $(h^2_{\text{SNP}})$
Estimated using tools such as GCTA under the mixed linear model
$$ \begin{align*} \mathbf{Y=X \beta+W u+\varepsilon} \end{align*} $$where
- $\mathbf{u} \sim N\left(0, \mathbf{I \sigma_u^2}\right)$
- $\varepsilon \sim N\left(0, \mathbf{I \sigma_2^2}\right)$
- $\beta$ is a fixed effect (no variation)
So the variance of $\mathbf{Y}$ is
$$ \begin{align*} \operatorname{Var}(\mathbf{Y})= & V \\ = & \operatorname{Var}(\mathbf{W u})+\operatorname{Var}(\varepsilon) \\ = & \mathbf{W W^T \sigma_u^2+I \sigma_{\varepsilon}^2} \end{align*} $$The standardized genotype matrix
$$ \begin{align*} \mathbf{W}=\left\{w_{i j}\right\}, \quad w_{i j}=\frac{X_{i j}-2 p_j}{\sqrt{2 p_j\left(1-p_j\right)}} \end{align*} $$where
- $X_{ij}$ is $j-th$ SNP for $i-th$ individual
- $p_j$ is $j-th$ SNP MAF
Define Genetic Relationship Matrix (GRM)
$$ \begin{align*} A=\frac{\mathbf{WW^T}}{N} \end{align*} $$where $\mathbf{\sigma_g^2}=N \mathbf{\sigma_u^2}$
Rewiting the model
$$ \begin{align*} \Rightarrow \quad \mathbf{Y=X \beta+g+\varepsilon}, \quad \mathbf{g} \sim N\left(\mathbf{0, A \sigma_g^2}\right) \end{align*} $$We estimate $\sigma_g^2$ using REML (Restricted Maximum Likelihood). The proportion of phenotypic variance explained by the SNPs used to construct the GRM is given by
$$ \begin{align*} h^2_{\text{SNP}} = \frac{\sigma_g^2}{\text{Var}(Y)} \end{align*} $$More detail read the paragraph GCTA
GWAS-based Heritability $(h_{\text {GWAS }}^2)$
This estimates heritability using only the significant SNPs identified in GWAS. Assuming $m$ significant SNPs are linearly associated with the trait
$$ \begin{align*} {Y}=\beta_0+\sum_{i=1}^m \beta_i X_i+\varepsilon \end{align*} $$The heritability is
$$ \begin{align*} h_{\text {GWAS }}^2 = \frac{Var(\hat{Y})}{Var(Y)} \end{align*} $$Relationship Between Heritability Types
$$ \begin{align*} h_{\text {family }}^2 > h_{\text {SNP }}^2 >> h_{\text {GWAS }}^2 \end{align*} $$This gap is known as missing heritability.
Code
We used the gcta64 command to estimate the Genetic Relationship Matrix (GRM) for each of the 22 chromosomes separately. Compared to estimating the GRM for all autosomes together, we found that the results are identical. However, the former method took approximately two and a half hours, significantly longer than the latter, which required only ten minutes.
Computing Separately
| |
Computing Together
| |
Result
| |
