Sampling Distribution
โํ๋ฅ ๊ณผ ํต๊ณ(MATH230)โ ์์ ์์ ๋ฐฐ์ด ๊ฒ๊ณผ ๊ณต๋ถํ ๊ฒ์ ์ ๋ฆฌํ ํฌ์คํธ์ ๋๋ค. ์ ์ฒด ํฌ์คํธ๋ Probability and Statistics์์ ํ์ธํ์ค ์ ์์ต๋๋ค ๐ฒ
์๋ฆฌ์ฆ: Sampling Distributions
Introduction
ํํต ์์ ์ ๋ฃ๋ ์ ์ฒด ํ์์ ๋์์ผ๋ก, ํํต ์์ ์ ์ ํธํ๋ ํ์์ ๋น์จ์ ๊ตฌํ๊ณ ์ ํ๋ค. ๊ทธ๋ฐ๋ฐ, ํํต ์์ ์ ๋ฃ๋ ํ์ ์๊ฐ ๋๋ฌด ๋ง์์ ์ ์ฒด๋ฅผ ์กฐ์ฌํ ์ ์๊ณ , ์ ์ฒด ์ค $n$๋ช ํ์์ ๋์์ผ๋ก ์ค๋ฌธ์กฐ์ฌ๋ฅผ ์ํํ๋ค๊ณ ํ์.
$X$๊ฐ โ$n$๋ช ์ ํ์ ์ค์ ํํต ์์ ์ ์ ํธํ๋ค๊ณ ์๋ตํ ํ์ ์โ๋ผ๋ RV๋ผ๋ฉด, $X$๋ HyperGeo์ ๋ถํฌ๋ฅผ ๋ฐ๋ฅผ ๊ฒ์ด๋ค.
๋, ๋ง์ฝ ์ ์ฒด ํ์ ์๊ฐ ์ถฉ๋ถํ ํฌ๋ค๋ฉด, HyperGeo๋ฅผ BIN์ผ๋ก ๊ทผ์ฌํ ์๋ ์์ ๊ฒ์ด๋ค.
์ด๋, ๊ฐ ํ์ $i$์ ์ ํธ๋ฅผ RV $X_i$๋ก ํํํด๋ณด์. ๊ทธ๋ฌ๋ฉด,
\[X_i = \begin{cases} 1 & i\text{-th student likes it!} \\ 0 & \text{else} \end{cases}\]๊ทธ๋ฌ๋ฉด, ์ ์ฒด RV $X_1, \dots, X_n$๋ฅผ ์ข ํฉํ๋ฉด, ์๋ก์ด RV $\overline{X}$๋ฅผ ์ ๋ํ ์ ์๋ค.
\[\overline{X} := \frac{X_1 + \cdots X_n}{n}\]์ฐ๋ฆฌ๋ ์ด $\overline{X}$๋ฅผ <sample mean>์ด๋ผ๊ณ ํ๋ค!
์์ ์์๋ฅผ ์ข๋ ๊ตฌ์ฒดํ ํด์ ์๊ฐํด๋ณด์.
$n=100$, and 60 students said they like lecture. Then, $\overline{x} = \frac{60}{100} = 0.6$
์ด๋, ์ฐ๋ฆฌ๊ฐ <sample mean> $\overline{x}$์ ๋ํด ๋ ผํ๊ณ ์ ํ๋ ์ฃผ์ ๋ ๋ฐ๋ก
\[P(\left| \overline{x} - 0.6 \right| < \epsilon)\]๊ณผ ๊ฐ์ ํ๋ฅ ์ ์ด๋ป๊ฒ ๊ตฌํ๋์ง์ ๋ํ ๊ฒ์ด๋ค. ์ด๊ฒ์ ๊ตฌํ๋ ์ด์ ๋
\[P(\left| \overline{x} - \mu_0 \right| < \epsilon)\]์ ํ๋ฅ ์ ๊ตฌํ์ฌ, ์ ์ํ $\mu_0$์ ์ฐ๋ฆฌ๊ฐ ์ป์ sample mean์ด ์ผ๋ง๋ ์ฐจ์ด ๋๋์ง๋ฅผ ํ์ธํ๊ณ , ์ด๊ฒ์ ํ์ฉํด $\mu = \mu_0$๋ผ๋ ๊ฐ์ค(Hypothesis)๋ฅผ ๊ฒ์ (Test)ํ ์ ์๊ธฐ ๋๋ฌธ์ด๋ค. ์ด ๋ด์ฉ์ ๋ค์ <๊ฐ์ค ๊ฒ์ ; Hypothesis Test> ๋ถ๋ถ์์ ์ข๋ ์์ธํ ๋ค๋ฃฌ๋ค.
$P(\left| \overline{x} - \mu_0 \right| < \epsilon)$, ์ด๊ฒ์ ๊ตฌํ๊ธฐ ์ํด์๋ $\overline{x}$์ ๋ํ ๋ถํฌ๋ฅผ ์์์ผ ํ๋ฉฐ, ์ฐ๋ฆฌ๋ ์ด๊ฒ์ <sampling distribution; ํ๋ณธ ๋ถํฌ>์ด๋ผ๊ณ ํ๋ค! ํ๋ณธ ๋ถํฌ์ ๋ํ ์ ์๋ ์ํฐํด์ ๋งจ ๋ง์ง๋ง์ ์ ๋ฆฌํ์๋ค.
Definition. population
A <population> is the totality of observations.
Definition. sample
A <sample> is a subset of population.
Definition. random sample
RVs $X_1, \dots, X_n$ are said to be a <random sample> of size $n$, if they are independent and identically distributed as pmf or pdf $f(x)$.
That is,
\[f_{(X_1, \dots, X_n)} (x_1, \dots, x_n) = f_{X_1} (x_1) \cdots f_{X_n} (x_n)\]The observed values $x_1, \dots, x_n$ of $X_1, \dots, X_n$ are called <sample points> or <observations>.
Definition. Statistics; ํต๊ณ๋
A <Statistics; ํต๊ณ๋> is a function of a random sample $X_1, \dots, X_n$, not depending on unknown parameters.
์ฆ, $f(X_1, \dots, X_n)$ ํํ์ ํจ์๋ฅผ <Statistics>๋ผ๊ณ ํ๋ค. ์ด <Statistics>๋ ํด๋น RV ์งํฉ์ ๋ํ๊ฐ ์ญํ ์ ํ๋ค.
Example.
Supp. $X_1, \dots, X_n$ is a random sample from $N(\mu, 1)$.
Then,
1. $\dfrac{X_1 + \cdots + X_n}{n}$ is a Statistics!
2. $\max \{ X_1, \dots, X_n \}$ is a Statistics!
3. $\dfrac{X_1 + \cdots + X_n + \mu}{n}$ is not a Statistics!
์ฐ๋ฆฌ๋ ์ค์ง <Statistics>์ ํตํด์๋ง population์ ๋ํ inference๋ฅผ ์ํํ ์ ์๋ค.
Location Measures of a Sample
Let $X_1, \dots, X_n$ be a random sample.
Definition. sample mean
$\overline{X} = \dfrac{X_1 + \cdots + X_n}{n}$ is called a <sample mean>.
(1) $\overline{X}$ is also a random variable!
(2) If $E(X_1) = \mu$ and $\text{Var}(X_1) = \sigma^2$, then $E(\overline{X}) = \dfrac{n\mu}{n} = \mu$ and $\text{Var}(\overline{X}) = \dfrac{\sigma^2}{n}$
(3) $\overline{X}$ can be sensitive to outliers.
Definition. sample median
๊ทธ๋ฅ Sample์์์ ์ค๊ฐ๊ฐ.
Definition. sample mode
Sample์์์ ์ต๋น๊ฐ.
Variability Measures of a Sample
Definition. sample variance
Let $X_1, \dots, X_n$ be a random sample with $E[X_i] = \mu$ and $\text{Var}(X_i) = \sigma^2$.
\[S^2 := \frac{1}{n-1} \sum^n_{i=1} \left( X_i - \overline{X}\right)^2\]Q. Why $(n-1)$ in the bottom??
A. ์๋ํ๋ฉด, $(n-1)$๋ก ๋๋ ์ค์ผ ํ๋ณธ ๋ถ์ฐ์ ํ๊ท $E[S^2]$์ด $\sigma^2$์ด ๋๊ธฐ ๋๋ฌธ!!!
Proof.
w.l.o.g. we can assume that $E[X_i] = 0$. (๊ทธ๋ฅ ํธ์๋ฅผ ์ํด $X_i$๋ฅผ ์ ๋นํ ํ์คํ ํ ๊ฒ์ด๋ค.)
\[\begin{aligned} S^2 &= \frac{1}{n-1} \sum^n_{i=1} \left( X_i^2 - 2 X_i \overline{X} + (\overline{X})^2 \right) \\ &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - 2 \overline{X} \sum^n_{i=1} X_i + n (\overline{X})^2 \right\} \\ \end{aligned}\]์ด๋, $\displaystyle\sum^n_{i=1} X_i$๋ ๊ทธ ์ ์์ ์ํด $n\overline{X}$๊ฐ ๋๋ค.
\[\begin{aligned} S^2 &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - 2 \overline{X} \cdot n\overline{X} + n (\overline{X})^2 \right\} \\ &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - n (\overline{X})^2 \right\} \\ \end{aligned}\]์ด์ ์์ ์์ ์๋ณ์ ํ๊ท ์ ์ทจํด๋ณด์.
\[\begin{aligned} E[S^2] &= \frac{1}{n-1} \left\{ \sum^n_{i=1} E(X_i)^2 - n E\left[(\overline{X})^2\right] \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - n \cdot \frac{1}{n^2} \cdot E \left[(X_1 + \cdots + X_n)^2 \right] \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \frac{1}{n} \cdot \left( n \cdot E[X_1^2] + \cancelto{0}{E[X_i X_j]} + \cdots \right) \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \frac{1}{\cancel{n}} \cdot \left( \cancel{n} \cancelto{\sigma^2}{E[X_1^2]} \right) \right\} \quad (\text{independence}) \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \sigma^2 \right\} \\ &= \sigma^2 \end{aligned}\]$\blacksquare$
Definition. sample standard deviation
Definition. range
Definition. sampling distribution
The probability distribution of a sample Statistics is called a <sampling distribution>.
ex) distribution of sample mean, distribution of sample variance, โฆ
์ด๋, ํ๋ณธ ํต๊ณ๋(sample Statisticss)๋ sample mean, sample variance์ ๊ฐ์ด ํ๋ณธ์ ํน์ฑ์ ๋ํ๋ด๋ ๋ํ๊ฐ์ด๋ค.
๐ Sampling Distribution of Mean, and CLT
๐ Sampling Distribution of Variance