Sampling Distribution of Variance
โํ๋ฅ ๊ณผ ํต๊ณ(MATH230)โ ์์ ์์ ๋ฐฐ์ด ๊ฒ๊ณผ ๊ณต๋ถํ ๊ฒ์ ์ ๋ฆฌํ ํฌ์คํธ์ ๋๋ค. ์ ์ฒด ํฌ์คํธ๋ Probability and Statistics์์ ํ์ธํ์ค ์ ์์ต๋๋ค ๐ฒ
์๋ฆฌ์ฆ: Sampling Distributions
Sampling Distribution of $S^2$
Let $X_1, \dots, X_n$ be a random sample with $\text{Var}(X_i) = \sigma^2$. We already know that $E[S^2] = \sigma^2$. How about the distribution of $\displaystyle S^2 = \dfrac{1}{n-1} \sum^n_{i=1} (X_i - \bar{X})^2$?
๊ฐ์ฅ ๊ฐ๋จํ ๊ฒฝ์ฐ์ธ, $n=2$์ธ ๊ฒฝ์ฐ๋ฅผ ์ดํด๋ณด์. ์ด๋, $\bar{X} = \dfrac{X_1 + X_2}{2}$์ด๋ค. ์ด๋, $Y_i := X_i - \bar{X}$๋ผ๊ณ ๋๋ค๋ฉด,
\[\begin{aligned} Y_1 = X_1 - \bar{X} = \frac{X_1 - X_2}{2} \\ Y_2 = X_2 - \bar{X} = \frac{X_2 - X_1}{2} \\ \end{aligned}\]์ฆ, $Y_1 = - Y_2$๋ก ์๋ก dependent๋ค! ๊ทธ๋์ $S^2$์ ๋ํด์๋ CLT๋ฅผ ์ ์ฉํ ์๊ฐ ์๋ค ๐ฅ ๊ทธ๋ฌ๋ ์๋์ ์ ๋ฆฌ๋ฅผ ํ์ฉํ๋ฉด, $S^2$์ ๋ํ Distribution์ ์ ๋ํ ์ ์๋ค!!
Note.
Let $X_1, \dots, X_n$ be random sample from $N(\mu, \sigma^2)$.
1. $\bar{X} \sim N\left( \mu, \sigma^2/n\right)$
2. $\displaystyle\sum^n_i \left( \frac{X_i - \mu}{\sigma} \right)^2 \sim \chi^2(n)$; $X_i$๋ฅผ ์ ๊ทํํ๋ฉด $Z(0, 1)$๊ฐ ๋๊ณ , ๋ ๊ฐ $X_i$๊ฐ independent ํ๊ธฐ ๋๋ฌธ!
Theorem. Sampling Distribution of $S^2$
Let $X_1, \dots, X_n$ be random sample from $N(\mu, \sigma^2)$, then
\[\frac{(n-1)S^2}{\sigma^2} = \sum^n_{i=1} \left( \frac{X_i - \bar{X}}{\sigma}\right)^2 \sim \chi^2 (n-1)\]โWe lose one degree of freedom, because we estimate a parameter $\mu$ by $\bar{X}$.โ
์์ฐ! Sample Variance $S^2$๊ณผ Population Variance $\sigma^2$์ ๋น์จ์ด Chi-square Distribution์ ๋ฐ๋ฅธ๋ค๋!
Proof.
[Step 1]
\[\frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 \sim \chi^2(n)\]์ด๊ฑด ๊ฐ๋จํ๋ค. $(X_i - \mu) / \sigma \sim N(0, 1)$์ ์ ๊ณฑ์ด $n$๊ฐ ํฉ์ด๋ ๋น์ฐํ $\chi^2(n)$์ ๋ฐ๋ฅธ๋ค.
[Step 2]
\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 &= \frac{1}{\sigma^2} \sum^n_{i=1} (X_i - \bar{X} + \bar{X} - \mu)^2 \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 + \frac{1}{\sigma^2} \sum^n_i 2 (X_i - \bar{X})(\bar{X} - \mu) \end{aligned}\][Step 3]
๋ง์ง๋ง ํ ์ธ $\displaystyle \frac{1}{\sigma^2} \sum^n_i 2 (X_i - \bar{X})(\bar{X} - \mu)$๋ฅผ ์ดํด๋ณด์.
\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})(\bar{X} - \mu) &= \frac{1}{\sigma^2} \cdot (\bar{X} - \mu) \cdot \sum^n_i (X_i - \bar{X}) \\ &= \frac{1}{\sigma^2} \cdot (\bar{X} - \mu) \cdot \cancelto{0}{(X_1 + \cdots + X_n - n\bar{X})} \\ &= 0 \end{aligned}\][Step 4]
๋ค์ ์๋ ์์ผ๋ก ๋์๊ฐ์
\[\begin{aligned} \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2 &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 + \cancelto{0}{\frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})(\bar{X} - \mu)} \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{1}{\sigma^2} \sum^n_i (\bar{X} - \mu)^2 \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \frac{n(\bar{X} - \mu)^2}{\sigma^2} \\ &= \frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 + \left( \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right)^2 \\ \end{aligned}\]์ด๋, ์ข๋ณ์ $\displaystyle \frac{1}{\sigma^2} \sum^n_i \left( X_i - \mu \right)^2$๋ $\chi^2(n)$์ ๋ถํฌ๋ฅผ ๋ฐ๋ฅด๊ณ , ์ฐ๋ณ์ $\displaystyle \left( \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right)^2$๋ $\chi^2(1)$์ ๋ถํฌ๋ฅผ ๋ฐ๋ฅธ๋ค.
๋ง์ฝ $Z = X + Y$์์ $Z \sim \chi^2(n)$์ด๊ณ , $Y \sim \chi^2(1)$์ผ ๋ $X \perp Y$๋ผ๋ฉด, $X \sim \chi^2(n-1)$๊ฐ ๋๋ค. ๊ทธ๋ฌ๋ ์์ง $X \perp Y$์ ๋ํด ํ์ธํ์ง ์์๋ค. ์๋์ Lemma๋ฅผ ํตํด $X \perp Y$๋ฅผ ํ์ธํด๋ณด์.
Lemma.
Let $X_1, \dots, X_n$ be a random sample from $N(\mu, \sigma^2)$, then $S^2$ and $\bar{X}$ are independent.
In fact, $\bar{X}$ and $(X_1 - \bar{X}, \; \dots, \; X_n - \bar{X})$ are independent.
๋ฐ๋ผ์, ์์ Lemma์ ์ํด
\[\frac{1}{\sigma^2} \sum^n_i (X_i - \bar{X})^2 = \frac{(n-1) S^2}{\sigma^2} \sim \chi^2(n-1)\]$\blacksquare$
์ด๋ฒ ํฌ์คํธ์์๋ Sample Variance $S^2$๊ณผ Population Variance $\sigma^2$์ ๋น์จ์ ๋ํ ๋ถํฌ๋ฅผ ๊ตฌํ๋ค.
\[\frac{(n-1) S^2}{\sigma^2} \sim \chi^2(n-1)\]์ด์ด์ง๋ ํฌ์คํธ์์ Population Variance $\sigma^2$๋ฅผ ๋ชจ๋ฅด๋ ์ํฉ์์ $\bar{X}$์ ๋ถํฌ๋ฅผ ๋ชจ๋ธ๋งํ๋ ๋ฐฉ๋ฒ์ ์ดํด๋ณธ๋ค. ์ด ๊ฒฝ์ฐ, <Studentโs t-distribution>๋ฅผ ์ฌ์ฉํ๋ค.
\[T := \dfrac{\overline{X} - \mu}{S / \sqrt{n}} = t(n-1)\]๐ Studentโs t-distribution
๋ง์ฝ ๋ ์ํ ์ง๋จ์ ๋ํด Sample Variance ๋น์จ์ ๋ํ ๋ถํฌ๋ฅผ ๋ชจ๋ธ๋งํ๋ค๋ฉด, <F-distribution>๊ฐ ๋๋ค!
\[F := \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = F(n_1 - 1, n_2 -1)\]๐ F-distribution