โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

6 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Sampling Distribution of $S^2$

Let $X_1, \dots, X_n$ be a random sample with $\text{Var}(X_i) = \sigma^2$. We already know that $E[S^2] = \sigma^2$. How about the distribution of $\displaystyle S^2 = \dfrac{1}{n-1} \sum^n_{i=1} (X_i - \bar{X})^2$?

๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๊ฒฝ์šฐ์ธ, $n=2$์ธ ๊ฒฝ์šฐ๋ฅผ ์‚ดํŽด๋ณด์ž. ์ด๋•Œ, $\bar{X} = \dfrac{X_1 + X_2}{2}$์ด๋‹ค. ์ด๋•Œ, $Y_i := X_i - \bar{X}$๋ผ๊ณ  ๋‘”๋‹ค๋ฉด,

\[\begin{aligned} Y_1 = X_1 - \bar{X} = \frac{X_1 - X_2}{2} \\ Y_2 = X_2 - \bar{X} = \frac{X_2 - X_1}{2} \\ \end{aligned}\]

์ฆ‰, $Y_1 = - Y_2$๋กœ ์„œ๋กœ dependent๋‹ค! ๊ทธ๋ž˜์„œ $S^2$์— ๋Œ€ํ•ด์„œ๋Š” CLT๋ฅผ ์ ์šฉํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค ๐Ÿ˜ฅ ๊ทธ๋Ÿฌ๋‚˜ ์•„๋ž˜์˜ ์ •๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜๋ฉด, $S^2$์— ๋Œ€ํ•œ Distribution์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค!!


Note.

Let $X_1, \dots, X_n$ be random sample from $N(\mu, \sigma^2)$.

1. $\bar{X} \sim N\left( \mu, \sigma^2/n\right)$

2. $\displaystyle\sum^n_i \left( \frac{X_i - \mu}{\sigma} \right)^2 \sim \chi^2(n)$; $X_i$๋ฅผ ์ •๊ทœํ™”ํ•˜๋ฉด $Z(0, 1)$๊ฐ€ ๋˜๊ณ , ๋˜ ๊ฐ $X_i$๊ฐ€ independent ํ•˜๊ธฐ ๋•Œ๋ฌธ!

Theorem. Sampling Distribution of $S^2$

Let $X_1, \dots, X_n$ be random sample from $N(\mu, \sigma^2)$, then

\[\frac{(n-1)S^2}{\sigma^2} = \sum^n_{i=1} \left( \frac{X_i - \bar{X}}{\sigma}\right)^2 \sim \chi^2 (n-1)\]

โ€œWe lose one degree of freedom, because we estimate a parameter $\mu$ by $\bar{X}$.โ€

์™€์šฐ! Sample Variance $S^2$๊ณผ Population Variance $\sigma^2$์˜ ๋น„์œจ์ด Chi-square Distribution์„ ๋”ฐ๋ฅธ๋‹ค๋‹ˆ!

Proof.

[Step 1]

\[\frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 \sim \chi^2(n)\]

์ด๊ฑด ๊ฐ„๋‹จํ•˜๋‹ค. $(X_i - \mu) / \sigma \sim N(0, 1)$์˜ ์ œ๊ณฑ์ด $n$๊ฐœ ํ•ฉ์ด๋‹ˆ ๋‹น์—ฐํžˆ $\chi^2(n)$์„ ๋”ฐ๋ฅธ๋‹ค.

[Step 2]

\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 &= \frac{1}{\sigma^2} \sum^n_{i=1} (X_i - \bar{X} + \bar{X} - \mu)^2 \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 + \frac{1}{\sigma^2} \sum^n_i 2 (X_i - \bar{X})(\bar{X} - \mu) \end{aligned}\]

[Step 3]

๋งˆ์ง€๋ง‰ ํ…€์ธ $\displaystyle \frac{1}{\sigma^2} \sum^n_i 2 (X_i - \bar{X})(\bar{X} - \mu)$๋ฅผ ์‚ดํŽด๋ณด์ž.

\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})(\bar{X} - \mu) &= \frac{1}{\sigma^2} \cdot (\bar{X} - \mu) \cdot \sum^n_i (X_i - \bar{X}) \\ &= \frac{1}{\sigma^2} \cdot (\bar{X} - \mu) \cdot \cancelto{0}{(X_1 + \cdots + X_n - n\bar{X})} \\ &= 0 \end{aligned}\]

[Step 4]

๋‹ค์‹œ ์›๋ž˜ ์‹์œผ๋กœ ๋Œ์•„๊ฐ€์„œ

\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 + \cancelto{0}{\frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})(\bar{X} - \mu)} \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{n(\bar{X} - \mu)^2}{\sigma^2} \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \left( \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right)^2 \\ \end{aligned}\]

์ด๋•Œ, ์ขŒ๋ณ€์˜ $\displaystyle \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2$๋Š” $\chi^2(n)$์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๊ณ , ์šฐ๋ณ€์˜ $\displaystyle \left( \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right)^2$๋Š” $\chi^2(1)$์˜ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.

๋งŒ์•ฝ $Z = X + Y$์—์„œ $Z \sim \chi^2(n)$์ด๊ณ , $Y \sim \chi^2(1)$์ผ ๋•Œ $X \perp Y$๋ผ๋ฉด, $X \sim \chi^2(n-1)$๊ฐ€ ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์•„์ง $X \perp Y$์— ๋Œ€ํ•ด ํ™•์ธํ•˜์ง€ ์•Š์•˜๋‹ค. ์•„๋ž˜์˜ Lemma๋ฅผ ํ†ตํ•ด $X \perp Y$๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

Lemma.

Let $X_1, \dots, X_n$ be a random sample from $N(\mu, \sigma^2)$, then $S^2$ and $\bar{X}$ are independent.

In fact, $\bar{X}$ and $(X_1 - \bar{X}, \; \dots, \; X_n - \bar{X})$ are independent.

๋”ฐ๋ผ์„œ, ์œ„์˜ Lemma์— ์˜ํ•ด

\[\frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 = \frac{(n-1) S^2}{\sigma^2} \sim \chi^2(n-1)\]

$\blacksquare$


์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„œ๋Š” Sample Variance $S^2$๊ณผ Population Variance $\sigma^2$์˜ ๋น„์œจ์— ๋Œ€ํ•œ ๋ถ„ํฌ๋ฅผ ๊ตฌํ–ˆ๋‹ค.

\[\frac{(n-1) S^2}{\sigma^2} \sim \chi^2(n-1)\]


์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„  Population Variance $\sigma^2$๋ฅผ ๋ชจ๋ฅด๋Š” ์ƒํ™ฉ์—์„œ $\bar{X}$์˜ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณธ๋‹ค. ์ด ๊ฒฝ์šฐ, <Studentโ€™s t-distribution>๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

\[T := \dfrac{\overline{X} - \mu}{S / \sqrt{n}} = t(n-1)\]

๐Ÿ‘‰ Studentโ€™s t-distribution


๋งŒ์•ฝ ๋‘ ์ƒ˜ํ”Œ ์ง‘๋‹จ์— ๋Œ€ํ•ด Sample Variance ๋น„์œจ์— ๋Œ€ํ•œ ๋ถ„ํฌ๋ฅผ ๋ชจ๋ธ๋งํ•œ๋‹ค๋ฉด, <F-distribution>๊ฐ€ ๋œ๋‹ค!

\[F := \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = F(n_1 - 1, n_2 -1)\]

๐Ÿ‘‰ F-distribution