β€œν™•λ₯ κ³Ό 톡계(MATH230)” μˆ˜μ—…μ—μ„œ 배운 것과 κ³΅λΆ€ν•œ 것을 μ •λ¦¬ν•œ ν¬μŠ€νŠΈμž…λ‹ˆλ‹€. 전체 ν¬μŠ€νŠΈλŠ” Probability and Statisticsμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€ 🎲

5 minute read

β€œν™•λ₯ κ³Ό 톡계(MATH230)” μˆ˜μ—…μ—μ„œ 배운 것과 κ³΅λΆ€ν•œ 것을 μ •λ¦¬ν•œ ν¬μŠ€νŠΈμž…λ‹ˆλ‹€. 전체 ν¬μŠ€νŠΈλŠ” Probability and Statisticsμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€ 🎲

Introduction to Linear Regression ν¬μŠ€νŠΈμ—μ„œ μ΄μ–΄μ§€λŠ” ν¬μŠ€νŠΈμž…λ‹ˆλ‹€.

이번 ν¬μŠ€νŠΈμ—μ„œλŠ” μ•„λž˜μ˜ 두 μ§ˆλ¬Έμ— λŒ€ν•΄ μ£Όμš”ν•˜κ²Œ μ‚΄νŽ΄λ³Ό μ˜ˆμ •μ΄λ‹€.

Q1. What are the distributions of $B_1$ and $B_0$?

Q2. What can be an estimator for $\sigma^2$?

Distribution of Regression Coefficients

Theorem.

Assume $\epsilon_i$s are iid normal random variables; $\epsilon_i \sim N(0, \sigma^2)$.

Then,

\[B_1 \sim N(\beta_1, \frac{\sigma^2}{S_{xx}})\] \[B_0 \sim N(\beta_0, \frac{\sum x_i^2}{n \; S_{xx}} \cdot \sigma^2)\]

proof.

\[\begin{aligned} B_1 &:= \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{S_{xx}} \\ &= \frac{\sum_{i=1}^n (x_i - \bar{x})y_i}{S_{xx}} \end{aligned}\]

is a linear combination of normal random variables $y_i$s, thus $B_1$ is also a normal RV.

Hence, we only need to find the mean and the variance of $B_1$.


1. Mean

$B_1$ is an unbiased estimator, so

\[E[B_1] = \beta_1\]

2. Variance

\[\begin{aligned} \text{Var}(B_1) &= \text{Var}\left(\frac{\sum_{i=1}^n (x_i - \bar{x})y_i}{S_{xx}}\right) \\ &= \frac{1}{S_{xx}^2} \cdot \left( \cancelto{S_{xx}}{\sum_{i=1}^n (x_i - \bar{x})} \right)^2 \cdot \cancelto{\sigma^2}{\text{Var}(y_i)} \\ &= \frac{\sigma^2}{S_{xx}} \end{aligned}\]

proof.

\[B_0 = \bar{y} - B_1 \bar{x}\]

is also a linear combination of normal random variables $y_i$s.

1. Mean

$B_0$ is also an unbiased estimator, so

\[E[B_0] = \beta_0\]

2. Variance

(Homework 🎈)


Estimator of Error Variance

Recall that $\sigma^2 = \text{Var}(\epsilon_i)$, and the $\epsilon_i$ was the difference btw response $y_i$ and true regression $\beta_0 + \beta_1 x_i$; $\epsilon_i = y_i - (\beta_0 + \beta_1 x_i)$.

Theorem.

The unbiased estimator of $\sigma^2$ is

\[s^2 := \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n-2} = \frac{\text{SSE}}{n-2}\]

Theorem.

$s^2$ is independent of $B_1$ and $B_0$, and

\[\frac{(n-2)S^2}{\sigma^2} \sim \chi^2(n-2)\]

proof.

μœ„μ˜ 두 정리에 λŒ€ν•œ 증λͺ…은 HW둜 남겨둔닀.

(Homework 🎈)


Inferences for Regression Coefficients

Supp. we have sample points $(x_1, y_1), \dots, (x_n, y_n)$ from $Y_i = \beta_0 + \beta_1 x_i + \epsilon_i$ where $\epsilon_i$s are iid $N(0, \sigma^2)$. Here, $\beta_0$ and $\beta_1$ are unknown parameters.

μš°λ¦¬λŠ” μœ„μ™€ 같은 μƒν™©μ—μ„œ $\beta_0$, $\beta_1$에 λŒ€ν•œ <confidence interval>을 μ°Ύκ³  또 그것을 μ΄μš©ν•΄ 검정을 μˆ˜ν–‰ν•΄ λ³Ό 것이닀!

μš°λ¦¬λŠ” $\beta_1$에 λŒ€ν•œ point estimator둜 $B_1 = S_{xy} / S_{xx}$λ₯Ό μ‚¬μš©ν–ˆκ³ , μ΄λ•Œ $B_1$의 λΆ„ν¬λŠ” μ•„λž˜μ™€ κ°™μ•˜λ‹€.

\[B_1 \sim N \left( \beta_1, \; \sigma^2/S_{xx} \right)\]

μ΄λ•Œ, $B_1$을 μ λ‹Ήνžˆ μ •κ·œν™”μ‹œν‚€λ©΄ μ•„λž˜μ™€ κ°™λ‹€.

\[\frac{B_1 - \beta_1}{\sigma / \sqrt{S_{xx}}} \sim N(0, 1)\]

μ΄λ•Œ, μš°λ¦¬λŠ” error variance $\sigma^2$에 λŒ€ν•œ 값을 λͺ¨λ₯Έλ‹€. λ”°λΌμ„œ 이λ₯Ό sample error variance인 $s^2 = \text{SSE}/(n-2)$둜 λŒ€μ²΄ν•΄μ€€λ‹€! κ·Έ κ²°κ³Ό λΆ„ν¬λŠ” <t-distribution>을 λ”°λ₯Έλ‹€.

\[\frac{B_1 - \beta_1}{s / \sqrt{S_{xx}}} \sim t(n-2)\]

이제 μœ„μ˜ λΆ„ν¬μ—μ„œ $\beta_1$에 λŒ€ν•œ $100(1-\alpha)\%$ confidence interval을 κ΅¬ν•˜λ©΄ μ•„λž˜μ™€ κ°™λ‹€.

\[\left( b_1 - t_{\alpha/2} (n-2) \cdot \frac{s}{\sqrt{S_{xx}}}, \; b_1 + t_{\alpha/2} (n-2) \cdot \frac{s}{\sqrt{S_{xx}}} \right)\]

λ‹€μŒμ€ $B_1$에 λŒ€ν•œ 뢄포식을 ν™œμš©ν•΄ 검정을 μ§„ν–‰ν•˜λ©΄ λœλ‹€!! πŸ˜†

λ§ˆμ°¬κ°€μ§€λ‘œ $B_0$에 λŒ€ν•΄μ„œλ„ 검정을 μˆ˜ν–‰ν•΄λ³΄μž. $B_0$의 λΆ„ν¬λŠ” μ•„λž˜μ™€ κ°™μ•˜λ‹€.

\[B_0 \sim N\left( \beta_0, \; \frac{\sigma^2 \cdot \sum_{i=1}^n x_i^2}{n S_{xx}}\right)\]

$B_0$λ₯Ό μ •κ·œν™”ν•˜κ³ , 또 $\sigma^2$λ₯Ό $s^2$둜 λŒ€μ²΄ν•΄μ£Όλ©΄ λΆ„ν¬λŠ” μ•„λž˜μ™€ κ°™λ‹€.

\[\frac{B_0 - \beta_0}{s \sqrt{\frac{\sum_{i=1}^n x_i^2}{n S_{xx}}}} \sim t(n-2)\]

λ§ˆμ°¬κ°€μ§€λ‘œ $\beta_0$에 λŒ€ν•œ $100(1-\alpha)\%$ confidence interval을 κ΅¬ν•˜κ³ , μ λ‹Ήνžˆ 검정을 잘 μˆ˜ν–‰ν•˜λ©΄ λœλ‹€! πŸ˜†


맺음말

μ΄μ–΄μ§€λŠ” ν¬μŠ€νŠΈμ—μ„  Linear Regression λͺ¨λΈμ—μ„œ μˆ˜ν–‰ν•˜λŠ” Predictionμ—μ„œ μˆ˜ν–‰ν•˜λŠ” 좔정에 λŒ€ν•΄ μ‚΄νŽ΄λ³Ό μ˜ˆμ •μ΄λ‹€. 이번 ν¬μŠ€νŠΈμ—μ„œ μ‚΄νŽ΄λ΄€λ˜ $B_1$, $B_0$의 뢄포λ₯Ό μ’…ν•©μ μœΌλ‘œ μ‚¬μš©ν•  μ˜ˆμ •μ΄λ©°, 이 과정을 톡해 Regression으둜 얻은 κ²°κ³Ό(response)의 신뒰도와 κ·Έ μ˜€μ°¨μ— λŒ€ν•΄ 더 μ‚΄νŽ΄λ³Ό 수 μžˆλ‹€.

πŸ‘‰ Prediction on Regression


이번 ν¬μŠ€νŠΈμ— μ œμ‹œ ν–ˆλ˜ HW 문제의 ν’€μ΄λŠ” μ•„λž˜μ˜ ν¬μŠ€νŠΈμ— μ •λ¦¬ν•΄λ‘μ—ˆλ‹€.

πŸ‘‰ Statistics - PS3