Statistics - PS3
โํ๋ฅ ๊ณผ ํต๊ณ(MATH230)โ ์์ ์์ ๋ฐฐ์ด ๊ฒ๊ณผ ๊ณต๋ถํ ๊ฒ์ ์ ๋ฆฌํ ํฌ์คํธ์ ๋๋ค. ์ ์ฒด ํฌ์คํธ๋ Probability and Statistics์์ ํ์ธํ์ค ์ ์์ต๋๋ค ๐ฒ
์ด ๊ธ์ โTest on Regressionโ ํฌ์คํธ์์ ์ ์ํ ์์ ๋ค์ ํ์ดํ ํฌ์คํธ์ ๋๋ค.
Theorem.
The variance of $B_0$, $\text{Var}(B_0)$ is
\[\text{Var}(B_0) = \frac{\sum_{i=1}^n x_i^2}{n S_{xx}} \cdot \sigma^2\]proof.
(1) $\text{Var}(\bar{y})$
\[\begin{aligned} \text{Var}(\bar{y}) &= \text{Var} \left( \frac{1}{n} \sum_{i=1}^n y_i\right) \\ &= \frac{1}{n^2} \sum_{i=1}^n \text{Var}(y_i) \\ &= \frac{1}{n^2} \cdot n \sigma^2 = \frac{\sigma^2}{n} \end{aligned}\](2) $\text{Var}(B_1)$
We already know that the variance of $B_1$ is
\[\text{Var}(B_1) = \frac{\sigma^2}{S_{xx}}\](3) $\text{Cov}(\bar{y}, B_1)$
\[\begin{aligned} \text{Cov}(\bar{y}, B_1) &= \text{Cov} \left( \frac{1}{n} \sum y_i, \frac{S_{xy}}{S_{xx}}\right) \\ &= \frac{1}{n} \frac{1}{S_{xx}} \cdot \text{Cov}(\sum y_i, S_{xy}) \\ &= \frac{1}{n S_{xx}} \cancelto{0}{\sum (x_i - \bar{x})} \cdot \text{Cov}(\sum y_i, \sum y_i) \\ &= 0 \end{aligned}\]์ด์ ์์ ๊ฒฐ๊ณผ๋ฅผ ์ข ํฉํ๋ฉด,
\[\begin{aligned} &= \text{Var}(\bar{y}) + (\bar{x})^2 \cdot \text{Var}(B_1) - 2 \bar{x} \cdot \text{Cov}(\bar{y}, B_1) \\ &= \frac{\sigma^2}{n} + (\bar{x})^2 \cdot \frac{\sigma^2}{S_{xx}} - 2 \bar{x} \cdot 0 \\ &= \frac{\sigma^2}{n} + (\bar{x})^2 \cdot \frac{\sigma^2}{S_{xx}} \\ &= \sigma^2 \left( \frac{1}{n} + \frac{(\bar{x})^2}{S_{xx}}\right) \\ &= \sigma^2 \left( \frac{S_{xx} + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum(x_i - \bar{x})x_i + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum x_i^2 - \bar{x} \sum x_i + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum x_i^2 - \cancel{\bar{x} n (\bar{x})} + \cancel{n (\bar{x})^2}}{n S_{xx}} \right) \\ &= \sigma^2 \cdot \frac{\sum x_i^2}{n S_{xx}} \end{aligned}\]$\blacksquare$
Theorem.
The unbiased estimator of $\sigma^2$ is
\[s^2 := \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-2} = \frac{\text{SSE}}{n-2}\]proof.
First, note that the regression equation is
\[\hat{y}_i = b_0 + b_1 x_i\]Then, the residual squared sum is
\[\begin{aligned} \sum_{i=1}^n (y_i - \hat{y}_i)^2 &= \sum_{i=1}^n (y_i - b_0 - b_1 x_i)^2 \\ &= \sum_{i=1}^n (y_i - (\bar{y} - b_1 \bar{x}) - b_1 x_i)^2 \\ &= \sum_{i=1}^n ((y_i - \bar{y}) - b_1(x_i - \bar{x}))^2 \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 \sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x}) + b_1^2 \sum_{i=1}^n (x_i - \bar{x})^2 \\ \end{aligned}\]์ด๋, $b_1 = S_{xy} / S_{xx}$์ด๋ฏ๋ก, ์์ ์ ๋ฆฌํ๋ฉด,
\[\begin{aligned} &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 \sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x}) + b_1^2 \sum_{i=1}^n (x_i - \bar{x})^2 \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 S_{xy} + b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 (b_1 S_{xx}) + b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i^2 - 2 y_i \bar{y} + \bar{y}^2) - b_1^2 S_{xx} \\ &= \left(\sum_{i=1}^n y_i^2\right) - 2 n \bar{y}^2 + n \bar{y}^2 - b_1^2 S_{xx} \\ &= \left(\sum_{i=1}^n y_i^2\right) - n \bar{y}^2 - b_1^2 S_{xx} \\ \end{aligned}\]์ด์ ์์ ์์ ํ๊ท ์ ์ทจํด๋ณด์.
\[E \left[ \left(\sum_{i=1}^n y_i^2\right) - n \bar{y}^2 - b_1^2 S_{xx} \right] = \left(\sum_{i=1}^n E[y_i^2] \right) - n E [\bar{y}^2] - S_{xx} E[b_1^2]\]์ด๋, ๊ฐ ๋ณ์์ ํ๊ท ๊ฐ์ ์๋์ ๊ฐ์ด ๊ตฌํ ์ ์๋ค.
\[\begin{aligned} E[y_i^2] &= \text{Var}(y_i) + (E[y_i])^2 \\ &= \sigma^2 + (\beta_0 + \beta_1 x_i)^2 \end{aligned}\] \[\begin{aligned} E[\bar{y}^2] &= \text{Var}(\bar{y}) + (E[\bar{y}])^2 \\ &= \frac{\sigma^2}{n} + (\beta_0 + \beta_1 \bar{x})^2 \end{aligned}\] \[\begin{aligned} E[b_1^2] &= \text{Var}(b_1) + (E[b_1])^2 \\ &= \frac{\sigma^2}{S_{xx}} + \beta_1^2 \end{aligned}\]์์ ๋์ ํ๋ฉด,
\[\begin{aligned} &= \left(\sum_{i=1}^n E[y_i^2] \right) - n E [\bar{y}^2] - S_{xx} E[b_1^2] \\ &= \left(\sum_{i=1}^n (\sigma^2 + (\beta_0 + \beta_1 x_i)^2) \right) - n \left( \frac{\sigma^2}{n} + (\beta_0 + \beta_1 \bar{x})^2 \right) - S_{xx} \left( \frac{\sigma^2}{S_{xx}} + \beta_1^2 \right) \\ &= n \sigma^2 + \left(\sum_{i=1}^n (\beta_0 + \beta_1 x_i)^2 \right) - (\sigma^2 + n (\beta_0 + \beta_1 \bar{x})^2) - (\sigma^2 + S_{xx} \beta_1^2) \\ &= (n-2) \sigma^2 - n (\beta_0 + \beta_1 \bar{x})^2 - S_{xx} \beta_1^2 + \left(\sum_{i=1}^n (\beta_0 + \beta_1 x_i)^2 \right) \\ &= (n-2) \sigma^2 - n (\beta_0^2 + 2 \beta_0 \beta_1 \bar{x} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left(\sum_{i=1}^n (\beta_0^2 + 2 \beta_0 \beta_1 x_i + \beta_1^2 x_i^2) \right) \\ &= (n-2) \sigma^2 - n (\cancel{\beta_0} + 2 \beta_0 \beta_1 \bar{x} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left( \cancel{n \beta_0^2} + 2 \beta_0 \beta_1 \sum_{i=1}^n x_i + \beta_1^2 \sum_{i=1}^n x_i^2 \right) \\ &= (n-2) \sigma^2 - n (\cancel{2 \beta_0 \beta_1 \bar{x}} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left(\cancel{2 \beta_0 \beta_1 \sum_{i=1}^n x_i} + \beta_1^2 \sum_{i=1}^n x_i^2 \right) \\ &= (n-2) \sigma^2 - n \beta_1^2 \bar{x}^2 - S_{xx} \beta_1^2 + \beta_1^2 \sum_{i=1}^n x_i^2 \\ &= (n-2) \sigma^2 - S_{xx} \beta_1^2 + \left( \beta_1^2 \sum_{i=1}^n x_i^2 - n \beta_1^2 \bar{x}^2 \right)\\ &= (n-2) \sigma^2 - \cancel{S_{xx} \beta_1^2} + \left( \cancel{\beta_1^2 S_{xx}} \right)\\ &= (n-2) \sigma^2 \end{aligned}\]์! ์ด์ estimator $s^2$์ ํ๊ท ์ ์ทจํด๋ณด์!
\[\begin{aligned} E[s^2] &= E\left[\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-2}\right] \\ &= \frac{1}{n-2} E \left[ \sum_{i=1}^n (y_i - \bar{y})^2 \right] \\ &= \frac{1}{n-2} \cdot (n-2) \sigma^2 \\ &= \sigma^2 \end{aligned}\]๋ฐ๋ผ์, $s^2$๋ unbiased estimator๋ค! $\blacksquare$
Theorem.
$s^2 \perp B_1$, and $s^2 \perp B_0$
proof.
(์ข๋ ๊ณต๋ถ ํ, ์์ฑ ์์ )
Theorem.
proof.
(์ข๋ ๊ณต๋ถ ํ, ์์ฑ ์์ )