$$Y_i = \beta_0+ \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1} + \varepsilon_i, \quad \varepsilon_i \sim N (0, \sigma_i^2) $$$$\frac{Y_i}{\sigma_i} = \frac{1}{\sigma_i} (\beta_0+ \beta_1 x_{i,1} + \cdots + \beta_{p-1} x_{i,p-1} + \varepsilon_i), \quad \frac{\varepsilon_i}{\sigma_i} \sim N (0, 1) $$$$\begin{align*} & Y = \begin{pmatrix} Y_1 \\\\\\ Y_2 \\\\\\ \vdots \\\\\\ Y_n \end{pmatrix}, \ X = \begin{pmatrix} 1 & x_{11} & \cdots & x_{1, p - 1} \\\\\\ 1 & x_{21} & \cdots & x_{2, p - 1} \\\\\\ \vdots & \vdots & \ddots & \vdots \\\\\\ 1 & x_{n1} & \cdots & x_{n, p - 1} \\\\\\ \end{pmatrix}, \ \varepsilon = \begin{pmatrix} \varepsilon_1 \\\\\\ \varepsilon_2 \\\\\\ \vdots \\\\\\ \varepsilon_n \end{pmatrix}, \ \beta = \begin{pmatrix} \beta_0 \\\\\\ \beta_1 \\\\\\ \vdots \\\\\\ \beta_{p-1} \end{pmatrix} \\\\\\ & W = \begin{pmatrix} 1 / \sigma_1^2 & 0 & \cdots & 0 \\\\\\ 0 & 1 / \sigma_2^2 & \cdots & 0 \\\\\\ \vdots & \vdots & \ddots & \vdots \\\\\\ 0 & 0 & \cdots & 1 / \sigma_n^2 \\\\\\ \end{pmatrix}, \ Y_W = W^{\frac{1}{2}} Y,\ X_W = W^{\frac{1}{2}} X \end{align*}$$$$\begin{align*} \tilde{\beta} & = \arg min \sum_{i = 1}^{n} \frac{(Y_i - \beta_0 - \beta_1 x_{i,1} - \cdots - \beta_{p - 1} x_{i, p - 1})^2}{\sigma_i^2} \\\\\\ & = \arg min (Y-X \beta)^{'} \begin{pmatrix} 1 / \sigma_1^2 & 0 & \cdots & 0 \\\\\\ 0 & 1 / \sigma_2^2 & \cdots & 0 \\\\\\ \vdots & \vdots & \ddots & \vdots \\\\\\ 0 & 0 & \cdots & 1 / \sigma_n^2 \\\\\\ \end{pmatrix} (Y-X \beta) \\\\\\ & = \arg min (Y-X \beta)^{'} W (Y-X \beta) \\\\\\ & = \arg min (W^{\frac{1}{2}} Y - W^{\frac{1}{2}} X \beta )^{'} (W^{\frac{1}{2}} Y - W^{\frac{1}{2}} X \beta ) \\\\\\ & = \arg min (Y_W - X_W \beta)^{'} (Y_W - X_W \beta) \\\\\\ & = \arg min \left[(Y_W)^{'} Y_W - 2\beta^{'} (X_W)^{'} Y_W + \beta^{'} (X_W)^{'} X_W \beta \right] \\\\\\ & = \arg min Q(\beta) \end{align*}$$$$\left[- 2(X_W)^{'} Y_W + 2 (X_W)^{'} X_W \tilde{\beta} \right] = 0$$$$\tilde{\beta} = ((X_W)^{'} X_W)^{-1} (X_W)^{'} Y_W = (X^{'} W X)^{-1} X^{'} W Y$$

A plot of $e_i$ against $X_i$ or $e_i$ against $\hat{Y_i}$ exhibits a megaphone shape. Regress $|e_i|$ against $X_i$ or regress $|e_i|$ against $\hat{Y_i}$. And we use the fitted value of the regression $\hat{|e_i|}$ to estimate $\sigma_i$ .

A plot of ${e_i}^2$ against $X_i$ or ${e_i}^2$ against $\hat{Y_i}$ exhibits an upward tendency. Regress ${e_i}^2$ against $X_i$ or regress ${e_i}^2$ against $\hat{Y_i}$. And we use fitted value of the regression $\hat{{e_i}^2}$ to estimate ${\sigma_i}^2$.

Example

The response $Y$ is the cost of the computer time and the predictor $X$ is the total number of responses in completing a lesson. The data downloaded from(https://online.stat.psu.edu/stat501/lesson/13/13.1/13.1.1).

Plot $e$ v.s. $X$, it shows a megaphone shape.

The variance of ordinary model is not constant. We regress $|e|$ v.s. $X$, then we use $\hat{|e_i|}$ to estimate $\sigma_i$. Therefore, $w_i=\frac{1}{{|e_i|}^2}$. Compare the OLSE and WLSE. It’s just a little bit different.

$X_7$ and $X_8$ have high correlation to $Y$. They seem to be an important predictor.

The QQ-Plot shows an S-shaped pattern. Maybe it violates the assumption of linear regression.

We use residual plot to diagnose. We find that the plot residual against $X_3$ shows a curved pattern.

$$Y = X_1 + X_2 + X_3 + X_4+ X_5 + X_6 + X_7 + X_8 + X_9 + X_{10} + X_3^2$$

There are 11 variables, and $2^{11}$ possible model. Accordingly, we use backward stepwise. What’s more, AIC in each step is decreasing.

The final model is $Y = X_1 + X_7 + X_8 + X_9 + X_{10} + X_3^2$ with R-squared $0.9986$ . Hence, it is the appropriate model.

The value of $X_8,\ X_9,\ X_{10}$ are unknown, so we assume it is zero. We want to estimate $Y$ by the new data. Using the prediction interval, we have $90 \%$ confidence that $Y \in [5338.02,\ 5373.58]$