Bayesian Interpretation for Positive False Discovery Rate

Problem

In multiple testing, we concern the rate of false positives among all rejected hypotheses rather than the probability reject wrongly at least a hypotheses. We allow reject null hypotheses is true under controled ratio.

pFDR can be written as a Bayesian posterior probability

pFDR

Concept

$$ \begin{table}[] \centering \begin{tabular}{lccc} \toprule & Not rejected & Rejected & Total \\ \midrule Null true & $U$ & $V$ & $m_0$ \\ Alternative true & $T$ & $S$ & $m_1$ \\ \midrule Total & $W$ & $R$ & $m$ \\ \bottomrule \end{tabular} \caption{Possible outcomes from m hypothesis tests} \end{table} $$

Theorem

posterior Thm

Suppose $m$ identical hypothesis tests are performed with statistics $T_1, \cdots, T_m$ and significance region $\Gamma$ . Assume that $(T_i , H_i)$ are $i.i.d.$ and $T_i \mid H_i \sim (1-H_i)F_0 + H_i F_1$ for null distribution $F_0$ and alternative distribution $F_1$ , and $H_i \sim Ber(\pi_1)$ then

$$ \begin{align*} \mathrm{pFDR}(\Gamma)= P(H=0 \mid T \in \Gamma) \end{align*} $$

where $\pi_0=1-\pi_1$

P-value(t) of observed statistic $T = t$ is defined to be

$$ \begin{align*} \mathrm{p\text{-}value}(t)= \inf_{\{\Gamma_{\alpha} : t \in \Gamma_{\alpha}\}} P(T\in\Gamma_\alpha \mid H=0) \end{align*} $$

For an observed statistic $T = t$ define the q-value of $t$ to be

$$ \begin{align*} \text{q-value}(t) = \inf_{\{\Gamma_{\alpha} : t \in \Gamma_{\alpha}\}} \text{pFDR}(\Gamma_{\alpha}) \end{align*} $$

Corollary 2

Under the assumptions of Theorem 1,

$$ \begin{align*} \text{q-value}(t) = \inf_{\{\Gamma_{\alpha} : t \in \Gamma_{\alpha}\}} P(H=0 \mid T \in \Gamma_{\alpha} ) \end{align*} $$

Thm for dependence stat

Suppose as $m \to \infty$, for each $\alpha>0$ for some conti. function $G_0, G_1$

$$ \begin{align*} \sum_{i=1}^{m} \frac{(1 - H_i)}{m} \to \pi_0, \quad\frac{V_m(\Gamma_\alpha)}{\sum_{i=1}^{m} (1 - H_i)} \to G_0(\alpha), \quad \frac{S_m(\Gamma_\alpha)}{\sum_{i=1}^{m} H_i} \to G_1(\alpha) \end{align*} $$

with probability 1

Then for any $\delta>0$

$$ \begin{align*} \text{(i)} & \quad \lim_{m \to \infty} \sup_{\alpha \geq \delta} \left| \frac{V_m(\Gamma_\alpha)}{R_m(\Gamma_\alpha) \vee 1} - P{\infty}(H = 0 \mid X \in \Gamma_\alpha) \right| \stackrel{a.s.}{=} 0 \\ \text{(ii)} & \quad \lim_{m \to \infty} \sup_{\alpha \geq \delta} \left| \text{FDR}_m(\Gamma_\alpha) - P{\infty}(H = 0 \mid X \in \Gamma_\alpha) \right| = 0 \\ \text{(iii)} & \quad \lim_{m \to \infty} \sup_{\alpha \geq \delta} \left| \text{pFDR}_m(\Gamma_\alpha) - P{\infty}(H = 0 \mid X \in \Gamma_\alpha) \right| = 0 \end{align*} $$

where $P{\infty}(H = 0 \mid X \in \Gamma_\alpha) = \frac{\pi_0 \cdot G_0(\alpha)}{\pi_0 \cdot G_0(\alpha) + (1 - \pi_0) \cdot G_1(\alpha)}$

Benefit

Limitation

Common Technology

Omnibus Test

The Omnibus Test uses summary data to deal with multiple cohorts/methods. In this paper, we use the omnibus test to check for significant associations across predictions from YFS, METSIM, and NTR (different tissues). For gene $i$

$$ \begin{align*} \text{omnibus}_i = \mathbf{Z_i^T C_i^{-1} Z_i} \overset{approx}{\sim} \chi^2_3 \end{align*} $$

where

$\mathbf{Z_i}$ is $3 \times 1$ vector, representing $3$ cohort TWAS Z score
$\mathbf{C_i}$ is $3 \times 3$ correlation matrix for $3$ cohort

Performance

True Data

Simulation

Reference

THE POSITIVE FALSE DISCOVERY RATE: A BAYESIAN INTERPRETATION AND THE q-VALUE