Statistics - PS3
βνλ₯ κ³Ό ν΅κ³(MATH230)β μμ μμ λ°°μ΄ κ²κ³Ό 곡λΆν κ²μ μ 리ν ν¬μ€νΈμ λλ€. μ 체 ν¬μ€νΈλ Probability and Statisticsμμ νμΈνμ€ μ μμ΅λλ€ π²
μ΄ κΈμ βTest on Regressionβ ν¬μ€νΈμμ μ μν μμ λ€μ νμ΄ν ν¬μ€νΈμ λλ€.
Theorem.
The variance of $B_0$, $\text{Var}(B_0)$ is
\[\text{Var}(B_0) = \frac{\sum_{i=1}^n x_i^2}{n S_{xx}} \cdot \sigma^2\]proof.
(1) $\text{Var}(\bar{y})$
\[\begin{aligned} \text{Var}(\bar{y}) &= \text{Var} \left( \frac{1}{n} \sum_{i=1}^n y_i\right) \\ &= \frac{1}{n^2} \sum_{i=1}^n \text{Var}(y_i) \\ &= \frac{1}{n^2} \cdot n \sigma^2 = \frac{\sigma^2}{n} \end{aligned}\](2) $\text{Var}(B_1)$
We already know that the variance of $B_1$ is
\[\text{Var}(B_1) = \frac{\sigma^2}{S_{xx}}\](3) $\text{Cov}(\bar{y}, B_1)$
\[\begin{aligned} \text{Cov}(\bar{y}, B_1) &= \text{Cov} \left( \frac{1}{n} \sum y_i, \frac{S_{xy}}{S_{xx}}\right) \\ &= \frac{1}{n} \frac{1}{S_{xx}} \cdot \text{Cov}(\sum y_i, S_{xy}) \\ &= \frac{1}{n S_{xx}} \cancelto{0}{\sum (x_i - \bar{x})} \cdot \text{Cov}(\sum y_i, \sum y_i) \\ &= 0 \end{aligned}\]μ΄μ μμ κ²°κ³Όλ₯Ό μ’ ν©νλ©΄,
\[\begin{aligned} &= \text{Var}(\bar{y}) + (\bar{x})^2 \cdot \text{Var}(B_1) - 2 \bar{x} \cdot \text{Cov}(\bar{y}, B_1) \\ &= \frac{\sigma^2}{n} + (\bar{x})^2 \cdot \frac{\sigma^2}{S_{xx}} - 2 \bar{x} \cdot 0 \\ &= \frac{\sigma^2}{n} + (\bar{x})^2 \cdot \frac{\sigma^2}{S_{xx}} \\ &= \sigma^2 \left( \frac{1}{n} + \frac{(\bar{x})^2}{S_{xx}}\right) \\ &= \sigma^2 \left( \frac{S_{xx} + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum(x_i - \bar{x})x_i + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum x_i^2 - \bar{x} \sum x_i + n (\bar{x})^2}{n S_{xx}} \right) \\ &= \sigma^2 \left( \frac{\sum x_i^2 - \cancel{\bar{x} n (\bar{x})} + \cancel{n (\bar{x})^2}}{n S_{xx}} \right) \\ &= \sigma^2 \cdot \frac{\sum x_i^2}{n S_{xx}} \end{aligned}\]$\blacksquare$
Theorem.
The unbiased estimator of $\sigma^2$ is
\[s^2 := \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-2} = \frac{\text{SSE}}{n-2}\]proof.
First, note that the regression equation is
\[\hat{y}_i = b_0 + b_1 x_i\]Then, the residual squared sum is
\[\begin{aligned} \sum_{i=1}^n (y_i - \hat{y}_i)^2 &= \sum_{i=1}^n (y_i - b_0 - b_1 x_i)^2 \\ &= \sum_{i=1}^n (y_i - (\bar{y} - b_1 \bar{x}) - b_1 x_i)^2 \\ &= \sum_{i=1}^n ((y_i - \bar{y}) - b_1(x_i - \bar{x}))^2 \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 \sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x}) + b_1^2 \sum_{i=1}^n (x_i - \bar{x})^2 \\ \end{aligned}\]μ΄λ, $b_1 = S_{xy} / S_{xx}$μ΄λ―λ‘, μμ μ 리νλ©΄,
\[\begin{aligned} &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 \sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x}) + b_1^2 \sum_{i=1}^n (x_i - \bar{x})^2 \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 S_{xy} + b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - 2b_1 (b_1 S_{xx}) + b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i - \bar{y})^2 - b_1^2 S_{xx} \\ &= \sum_{i=1}^n (y_i^2 - 2 y_i \bar{y} + \bar{y}^2) - b_1^2 S_{xx} \\ &= \left(\sum_{i=1}^n y_i^2\right) - 2 n \bar{y}^2 + n \bar{y}^2 - b_1^2 S_{xx} \\ &= \left(\sum_{i=1}^n y_i^2\right) - n \bar{y}^2 - b_1^2 S_{xx} \\ \end{aligned}\]μ΄μ μμ μμ νκ· μ μ·¨ν΄λ³΄μ.
\[E \left[ \left(\sum_{i=1}^n y_i^2\right) - n \bar{y}^2 - b_1^2 S_{xx} \right] = \left(\sum_{i=1}^n E[y_i^2] \right) - n E [\bar{y}^2] - S_{xx} E[b_1^2]\]μ΄λ, κ° λ³μμ νκ· κ°μ μλμ κ°μ΄ ꡬν μ μλ€.
\[\begin{aligned} E[y_i^2] &= \text{Var}(y_i) + (E[y_i])^2 \\ &= \sigma^2 + (\beta_0 + \beta_1 x_i)^2 \end{aligned}\] \[\begin{aligned} E[\bar{y}^2] &= \text{Var}(\bar{y}) + (E[\bar{y}])^2 \\ &= \frac{\sigma^2}{n} + (\beta_0 + \beta_1 \bar{x})^2 \end{aligned}\] \[\begin{aligned} E[b_1^2] &= \text{Var}(b_1) + (E[b_1])^2 \\ &= \frac{\sigma^2}{S_{xx}} + \beta_1^2 \end{aligned}\]μμ λμ νλ©΄,
\[\begin{aligned} &= \left(\sum_{i=1}^n E[y_i^2] \right) - n E [\bar{y}^2] - S_{xx} E[b_1^2] \\ &= \left(\sum_{i=1}^n (\sigma^2 + (\beta_0 + \beta_1 x_i)^2) \right) - n \left( \frac{\sigma^2}{n} + (\beta_0 + \beta_1 \bar{x})^2 \right) - S_{xx} \left( \frac{\sigma^2}{S_{xx}} + \beta_1^2 \right) \\ &= n \sigma^2 + \left(\sum_{i=1}^n (\beta_0 + \beta_1 x_i)^2 \right) - (\sigma^2 + n (\beta_0 + \beta_1 \bar{x})^2) - (\sigma^2 + S_{xx} \beta_1^2) \\ &= (n-2) \sigma^2 - n (\beta_0 + \beta_1 \bar{x})^2 - S_{xx} \beta_1^2 + \left(\sum_{i=1}^n (\beta_0 + \beta_1 x_i)^2 \right) \\ &= (n-2) \sigma^2 - n (\beta_0^2 + 2 \beta_0 \beta_1 \bar{x} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left(\sum_{i=1}^n (\beta_0^2 + 2 \beta_0 \beta_1 x_i + \beta_1^2 x_i^2) \right) \\ &= (n-2) \sigma^2 - n (\cancel{\beta_0} + 2 \beta_0 \beta_1 \bar{x} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left( \cancel{n \beta_0^2} + 2 \beta_0 \beta_1 \sum_{i=1}^n x_i + \beta_1^2 \sum_{i=1}^n x_i^2 \right) \\ &= (n-2) \sigma^2 - n (\cancel{2 \beta_0 \beta_1 \bar{x}} + \beta_1^2 \bar{x}^2) - S_{xx} \beta_1^2 + \left(\cancel{2 \beta_0 \beta_1 \sum_{i=1}^n x_i} + \beta_1^2 \sum_{i=1}^n x_i^2 \right) \\ &= (n-2) \sigma^2 - n \beta_1^2 \bar{x}^2 - S_{xx} \beta_1^2 + \beta_1^2 \sum_{i=1}^n x_i^2 \\ &= (n-2) \sigma^2 - S_{xx} \beta_1^2 + \left( \beta_1^2 \sum_{i=1}^n x_i^2 - n \beta_1^2 \bar{x}^2 \right)\\ &= (n-2) \sigma^2 - \cancel{S_{xx} \beta_1^2} + \left( \cancel{\beta_1^2 S_{xx}} \right)\\ &= (n-2) \sigma^2 \end{aligned}\]μ! μ΄μ estimator $s^2$μ νκ· μ μ·¨ν΄λ³΄μ!
\[\begin{aligned} E[s^2] &= E\left[\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-2}\right] \\ &= \frac{1}{n-2} E \left[ \sum_{i=1}^n (y_i - \bar{y})^2 \right] \\ &= \frac{1}{n-2} \cdot (n-2) \sigma^2 \\ &= \sigma^2 \end{aligned}\]λ°λΌμ, $s^2$λ unbiased estimatorλ€! $\blacksquare$
Theorem.
$s^2 \perp B_1$, and $s^2 \perp B_0$
proof.
(μ’λ κ³΅λΆ ν, μμ± μμ )
Theorem.
proof.
(μ’λ κ³΅λΆ ν, μμ± μμ )