Sampling Distribution
βνλ₯ κ³Ό ν΅κ³(MATH230)β μμ μμ λ°°μ΄ κ²κ³Ό 곡λΆν κ²μ μ 리ν ν¬μ€νΈμ λλ€. μ 체 ν¬μ€νΈλ Probability and Statisticsμμ νμΈνμ€ μ μμ΅λλ€ π²
μ리μ¦: Sampling Distributions
Introduction
νν΅ μμ μ λ£λ μ 체 νμμ λμμΌλ‘, νν΅ μμ μ μ νΈνλ νμμ λΉμ¨μ ꡬνκ³ μ νλ€. κ·Έλ°λ°, νν΅ μμ μ λ£λ νμ μκ° λ무 λ§μμ μ 체λ₯Ό μ‘°μ¬ν μ μκ³ , μ 체 μ€ $n$λͺ νμμ λμμΌλ‘ μ€λ¬Έμ‘°μ¬λ₯Ό μννλ€κ³ νμ.
$X$κ° β$n$λͺ μ νμ μ€μ νν΅ μμ μ μ νΈνλ€κ³ μλ΅ν νμ μβλΌλ RVλΌλ©΄, $X$λ HyperGeoμ λΆν¬λ₯Ό λ°λ₯Ό κ²μ΄λ€.
λ, λ§μ½ μ 체 νμ μκ° μΆ©λΆν ν¬λ€λ©΄, HyperGeoλ₯Ό BINμΌλ‘ κ·Όμ¬ν μλ μμ κ²μ΄λ€.
μ΄λ, κ° νμ $i$μ μ νΈλ₯Ό RV $X_i$λ‘ ννν΄λ³΄μ. κ·Έλ¬λ©΄,
\[X_i = \begin{cases} 1 & i\text{-th student likes it!} \\ 0 & \text{else} \end{cases}\]κ·Έλ¬λ©΄, μ 체 RV $X_1, \dots, X_n$λ₯Ό μ’ ν©νλ©΄, μλ‘μ΄ RV $\overline{X}$λ₯Ό μ λν μ μλ€.
\[\overline{X} := \frac{X_1 + \cdots X_n}{n}\]μ°λ¦¬λ μ΄ $\overline{X}$λ₯Ό <sample mean>μ΄λΌκ³ νλ€!
μμ μμλ₯Ό μ’λ ꡬ체ν ν΄μ μκ°ν΄λ³΄μ.
$n=100$, and 60 students said they like lecture. Then, $\overline{x} = \frac{60}{100} = 0.6$
μ΄λ, μ°λ¦¬κ° <sample mean> $\overline{x}$μ λν΄ λ Όνκ³ μ νλ μ£Όμ λ λ°λ‘
\[P(\left| \overline{x} - 0.6 \right| < \epsilon)\]κ³Ό κ°μ νλ₯ μ μ΄λ»κ² ꡬνλμ§μ λν κ²μ΄λ€. μ΄κ²μ ꡬνλ μ΄μ λ
\[P(\left| \overline{x} - \mu_0 \right| < \epsilon)\]μ νλ₯ μ ꡬνμ¬, μ μν $\mu_0$μ μ°λ¦¬κ° μ»μ sample meanμ΄ μΌλ§λ μ°¨μ΄ λλμ§λ₯Ό νμΈνκ³ , μ΄κ²μ νμ©ν΄ $\mu = \mu_0$λΌλ κ°μ€(Hypothesis)λ₯Ό κ²μ (Test)ν μ μκΈ° λλ¬Έμ΄λ€. μ΄ λ΄μ©μ λ€μ <κ°μ€ κ²μ ; Hypothesis Test> λΆλΆμμ μ’λ μμΈν λ€λ£¬λ€.
$P(\left| \overline{x} - \mu_0 \right| < \epsilon)$, μ΄κ²μ ꡬνκΈ° μν΄μλ $\overline{x}$μ λν λΆν¬λ₯Ό μμμΌ νλ©°, μ°λ¦¬λ μ΄κ²μ <sampling distribution; νλ³Έ λΆν¬>μ΄λΌκ³ νλ€! νλ³Έ λΆν¬μ λν μ μλ μν°ν΄μ 맨 λ§μ§λ§μ μ 리νμλ€.
Definition. population
A <population> is the totality of observations.
Definition. sample
A <sample> is a subset of population.
Definition. random sample
RVs $X_1, \dots, X_n$ are said to be a <random sample> of size $n$, if they are independent and identically distributed as pmf or pdf $f(x)$.
That is,
\[f_{(X_1, \dots, X_n)} (x_1, \dots, x_n) = f_{X_1} (x_1) \cdots f_{X_n} (x_n)\]The observed values $x_1, \dots, x_n$ of $X_1, \dots, X_n$ are called <sample points> or <observations>.
Definition. Statistics; ν΅κ³λ
A <Statistics; ν΅κ³λ> is a function of a random sample $X_1, \dots, X_n$, not depending on unknown parameters.
μ¦, $f(X_1, \dots, X_n)$ ννμ ν¨μλ₯Ό <Statistics>λΌκ³ νλ€. μ΄ <Statistics>λ ν΄λΉ RV μ§ν©μ λνκ° μν μ νλ€.
Example.
Supp. $X_1, \dots, X_n$ is a random sample from $N(\mu, 1)$.
Then,
1. $\dfrac{X_1 + \cdots + X_n}{n}$ is a Statistics!
2. $\max \{ X_1, \dots, X_n \}$ is a Statistics!
3. $\dfrac{X_1 + \cdots + X_n + \mu}{n}$ is not a Statistics!
μ°λ¦¬λ μ€μ§ <Statistics>μ ν΅ν΄μλ§ populationμ λν inferenceλ₯Ό μνν μ μλ€.
Location Measures of a Sample
Let $X_1, \dots, X_n$ be a random sample.
Definition. sample mean
$\overline{X} = \dfrac{X_1 + \cdots + X_n}{n}$ is called a <sample mean>.
(1) $\overline{X}$ is also a random variable!
(2) If $E(X_1) = \mu$ and $\text{Var}(X_1) = \sigma^2$, then $E(\overline{X}) = \dfrac{n\mu}{n} = \mu$ and $\text{Var}(\overline{X}) = \dfrac{\sigma^2}{n}$
(3) $\overline{X}$ can be sensitive to outliers.
Definition. sample median
κ·Έλ₯ Sampleμμμ μ€κ°κ°.
Definition. sample mode
Sampleμμμ μ΅λΉκ°.
Variability Measures of a Sample
Definition. sample variance
Let $X_1, \dots, X_n$ be a random sample with $E[X_i] = \mu$ and $\text{Var}(X_i) = \sigma^2$.
\[S^2 := \frac{1}{n-1} \sum^n_{i=1} \left( X_i - \overline{X}\right)^2\]Q. Why $(n-1)$ in the bottom??
A. μλνλ©΄, $(n-1)$λ‘ λλ μ€μΌ νλ³Έ λΆμ°μ νκ· $E[S^2]$μ΄ $\sigma^2$μ΄ λκΈ° λλ¬Έ!!!
Proof.
w.l.o.g. we can assume that $E[X_i] = 0$. (κ·Έλ₯ νΈμλ₯Ό μν΄ $X_i$λ₯Ό μ λΉν νμ€ν ν κ²μ΄λ€.)
\[\begin{aligned} S^2 &= \frac{1}{n-1} \sum^n_{i=1} \left( X_i^2 - 2 X_i \overline{X} + (\overline{X})^2 \right) \\ &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - 2 \overline{X} \sum^n_{i=1} X_i + n (\overline{X})^2 \right\} \\ \end{aligned}\]μ΄λ, $\displaystyle\sum^n_{i=1} X_i$λ κ·Έ μ μμ μν΄ $n\overline{X}$κ° λλ€.
\[\begin{aligned} S^2 &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - 2 \overline{X} \cdot n\overline{X} + n (\overline{X})^2 \right\} \\ &= \frac{1}{n-1} \left\{ \sum^n_{i=1} X_i^2 - n (\overline{X})^2 \right\} \\ \end{aligned}\]μ΄μ μμ μμ μλ³μ νκ· μ μ·¨ν΄λ³΄μ.
\[\begin{aligned} E[S^2] &= \frac{1}{n-1} \left\{ \sum^n_{i=1} E(X_i)^2 - n E\left[(\overline{X})^2\right] \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - n \cdot \frac{1}{n^2} \cdot E \left[(X_1 + \cdots + X_n)^2 \right] \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \frac{1}{n} \cdot \left( n \cdot E[X_1^2] + \cancelto{0}{E[X_i X_j]} + \cdots \right) \right\} \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \frac{1}{\cancel{n}} \cdot \left( \cancel{n} \cancelto{\sigma^2}{E[X_1^2]} \right) \right\} \quad (\text{independence}) \\ &= \frac{1}{n-1} \left\{ n \cdot \sigma^2 - \sigma^2 \right\} \\ &= \sigma^2 \end{aligned}\]$\blacksquare$
Definition. sample standard deviation
Definition. range
Definition. sampling distribution
The probability distribution of a sample Statistics is called a <sampling distribution>.
ex) distribution of sample mean, distribution of sample variance, β¦
μ΄λ, νλ³Έ ν΅κ³λ(sample Statisticss)λ sample mean, sample varianceμ κ°μ΄ νλ³Έμ νΉμ±μ λνλ΄λ λνκ°μ΄λ€.
π Sampling Distribution of Mean, and CLT
π Sampling Distribution of Variance