Test on Regression
βνλ₯ κ³Ό ν΅κ³(MATH230)β μμ μμ λ°°μ΄ κ²κ³Ό 곡λΆν κ²μ μ 리ν ν¬μ€νΈμ λλ€. μ 체 ν¬μ€νΈλ Probability and Statisticsμμ νμΈνμ€ μ μμ΅λλ€ π²
Introduction to Linear Regression ν¬μ€νΈμμ μ΄μ΄μ§λ ν¬μ€νΈμ λλ€.
μ΄λ² ν¬μ€νΈμμλ μλμ λ μ§λ¬Έμ λν΄ μ£Όμνκ² μ΄ν΄λ³Ό μμ μ΄λ€.
Q1. What are the distributions of $B_1$ and $B_0$?
Q2. What can be an estimator for $\sigma^2$?
Distribution of Regression Coefficients
Theorem.
Assume $\epsilon_i$s are iid normal random variables; $\epsilon_i \sim N(0, \sigma^2)$.
Then,
\[B_1 \sim N(\beta_1, \frac{\sigma^2}{S_{xx}})\] \[B_0 \sim N(\beta_0, \frac{\sum x_i^2}{n \; S_{xx}} \cdot \sigma^2)\]proof.
is a linear combination of normal random variables $y_i$s, thus $B_1$ is also a normal RV.
Hence, we only need to find the mean and the variance of $B_1$.
1. Mean
$B_1$ is an unbiased estimator, so
\[E[B_1] = \beta_1\]2. Variance
\[\begin{aligned} \text{Var}(B_1) &= \text{Var}\left(\frac{\sum_{i=1}^n (x_i - \bar{x})y_i}{S_{xx}}\right) \\ &= \frac{1}{S_{xx}^2} \cdot \left( \cancelto{S_{xx}}{\sum_{i=1}^n (x_i - \bar{x})} \right)^2 \cdot \cancelto{\sigma^2}{\text{Var}(y_i)} \\ &= \frac{\sigma^2}{S_{xx}} \end{aligned}\]proof.
is also a linear combination of normal random variables $y_i$s.
1. Mean
$B_0$ is also an unbiased estimator, so
\[E[B_0] = \beta_0\]2. Variance
(Homework π)
Estimator of Error Variance
Recall that $\sigma^2 = \text{Var}(\epsilon_i)$, and the $\epsilon_i$ was the difference btw response $y_i$ and true regression $\beta_0 + \beta_1 x_i$; $\epsilon_i = y_i - (\beta_0 + \beta_1 x_i)$.
Theorem.
The unbiased estimator of $\sigma^2$ is
\[s^2 := \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-2} = \frac{\text{SSE}}{n-2}\]Theorem.
$s^2$ is independent of $B_1$ and $B_0$, and
\[\frac{(n-2)S^2}{\sigma^2} \sim \chi^2(n-2)\]proof.
μμ λ μ 리μ λν μ¦λͺ μ HWλ‘ λ¨κ²¨λλ€.
(Homework π)
Inferences for Regression Coefficients
Supp. we have sample points $(x_1, y_1), \dots, (x_n, y_n)$ from $Y_i = \beta_0 + \beta_1 x_i + \epsilon_i$ where $\epsilon_i$s are iid $N(0, \sigma^2)$. Here, $\beta_0$ and $\beta_1$ are unknown parameters.
μ°λ¦¬λ μμ κ°μ μν©μμ $\beta_0$, $\beta_1$μ λν <confidence interval>μ μ°Ύκ³ λ κ·Έκ²μ μ΄μ©ν΄ κ²μ μ μνν΄ λ³Ό κ²μ΄λ€!
μ°λ¦¬λ $\beta_1$μ λν point estimatorλ‘ $B_1 = S_{xy} / S_{xx}$λ₯Ό μ¬μ©νκ³ , μ΄λ $B_1$μ λΆν¬λ μλμ κ°μλ€.
\[B_1 \sim N \left( \beta_1, \; \sigma^2/S_{xx} \right)\]μ΄λ, $B_1$μ μ λΉν μ κ·νμν€λ©΄ μλμ κ°λ€.
\[\frac{B_1 - \beta_1}{\sigma / \sqrt{S_{xx}}} \sim N(0, 1)\]μ΄λ, μ°λ¦¬λ error variance $\sigma^2$μ λν κ°μ λͺ¨λ₯Έλ€. λ°λΌμ μ΄λ₯Ό sample error varianceμΈ $s^2 = \text{SSE}/(n-2)$λ‘ λ체ν΄μ€λ€! κ·Έ κ²°κ³Ό λΆν¬λ <t-distribution>μ λ°λ₯Έλ€.
\[\frac{B_1 - \beta_1}{s / \sqrt{S_{xx}}} \sim t(n-2)\]μ΄μ μμ λΆν¬μμ $\beta_1$μ λν $100(1-\alpha)\%$ confidence intervalμ ꡬνλ©΄ μλμ κ°λ€.
\[\left( b_1 - t_{\alpha/2} (n-2) \cdot \frac{s}{\sqrt{S_{xx}}}, \; b_1 + t_{\alpha/2} (n-2) \cdot \frac{s}{\sqrt{S_{xx}}} \right)\]λ€μμ $B_1$μ λν λΆν¬μμ νμ©ν΄ κ²μ μ μ§ννλ©΄ λλ€!! π
λ§μ°¬κ°μ§λ‘ $B_0$μ λν΄μλ κ²μ μ μνν΄λ³΄μ. $B_0$μ λΆν¬λ μλμ κ°μλ€.
\[B_0 \sim N\left( \beta_0, \; \frac{\sigma^2 \cdot \sum_{i=1}^n x_i^2}{n S_{xx}}\right)\]$B_0$λ₯Ό μ κ·ννκ³ , λ $\sigma^2$λ₯Ό $s^2$λ‘ λ체ν΄μ£Όλ©΄ λΆν¬λ μλμ κ°λ€.
\[\frac{B_0 - \beta_0}{s \sqrt{\frac{\sum_{i=1}^n x_i^2}{n S_{xx}}}} \sim t(n-2)\]λ§μ°¬κ°μ§λ‘ $\beta_0$μ λν $100(1-\alpha)\%$ confidence intervalμ ꡬνκ³ , μ λΉν κ²μ μ μ μννλ©΄ λλ€! π
λ§Ίμλ§
μ΄μ΄μ§λ ν¬μ€νΈμμ Linear Regression λͺ¨λΈμμ μννλ Predictionμμ μννλ μΆμ μ λν΄ μ΄ν΄λ³Ό μμ μ΄λ€. μ΄λ² ν¬μ€νΈμμ μ΄ν΄λ΄€λ $B_1$, $B_0$μ λΆν¬λ₯Ό μ’ ν©μ μΌλ‘ μ¬μ©ν μμ μ΄λ©°, μ΄ κ³Όμ μ ν΅ν΄ RegressionμΌλ‘ μ»μ κ²°κ³Ό(response)μ μ λ’°λμ κ·Έ μ€μ°¨μ λν΄ λ μ΄ν΄λ³Ό μ μλ€.
μ΄λ² ν¬μ€νΈμ μ μ νλ HW λ¬Έμ μ νμ΄λ μλμ ν¬μ€νΈμ μ 리ν΄λμλ€.
π Statistics - PS3