โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

10 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Introduction to Estimation

โ€<Statistics> is the area of science which can make inferences from data set.โ€

โ€<Statistical Inference; ํ†ต๊ณ„์  ์ถ”๋ก > means making generalization about the population properties based on a random sample.โ€

Supp. someone gave you some data set $\{ x_1, \dots, x_n \}$ and it is known that this data set is taken from a normal random sample $X_i \sim N(\mu, 1)$.

Q. You are asked to estimate $\mu$. What can be a good estimate of $\mu$ from the sample?

A. $\bar{x}$, sample mean
why? by LLN, $\bar{x} \rightarrow \mu$ as $n \rightarrow \infty$.


์œ„์˜ sample mean $\bar{x}$ ๊ฐ™์ด ๋ชจ์ง‘๋‹จ(population)์˜ ์„ฑ์งˆ์„ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ <์ถ”์ •(Estimation)>์ด๋ผ๊ณ  ํ•œ๋‹ค.

์ถ”์ •์—๋Š” <Point Estimation>๊ณผ <Interval Estimation>, 2๊ฐ€์ง€ ๋ฐฉ์‹์ด ์กด์žฌํ•œ๋‹ค.

population mean $\mu$์„ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด sample mean $\bar{x}$๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ <Point Estimation> ๋ฐฉ์‹์ด๋‹ค.

๋งŒ์•ฝ โ€œpopulation mean $\mu$๋Š” ์–ด๋–ค ๋ฒ”์œ„(interval) $(a, b)$์—์„œ ๋†’์€ ํ™•๋ฅ ๋กœ ์กด์žฌํ•  ๊ฒƒ์ด๋‹คโ€๋ผ๊ณ  interval $(a, b)$์„ ์ œ์‹œํ•˜๋Š” ๋ฐฉ์‹์„ <Interval Estimation>์ด๋ผ๊ณ  ํ•œ๋‹ค.

ex) $P\left( \mu \in (a, b) \right) \approx 0.99 \quad \text{or} \quad 0.95$.

๐Ÿ’ฅ Note: For true distribution $N(\mu, \sigma^2)$, $\mu$, $\sigma$ are unknown, and not random!!


Point Estimation

Let $X_1, \dots, X_n$ be a random sample and $X_i \sim f(x; \theta)$ for some pdf(or pmf), and let $x_1, \dots, x_n$ be sample points.

A <Point Estimation> of some population parameter $\theta$ is a single value $\hat{\theta}$ of statistic1 $\hat{\Theta}$.

์ด๋•Œ, statistic $\hat{\Theta}$๋ฅผ estimator๋ผ๊ณ  ํ•˜๋ฉฐ, estimator $\hat{\Theta}$๋Š” Random Variable์ด๋‹ค.

(hat $\hat{x}$์ด ๋ถ™์œผ๋ฉด random sample๋กœ๋ถ€ํ„ฐ ์œ ๋„๋˜๋Š” ๋Œ€์ƒ์ด๋‹ค.)


Example.

Let $X_1, X_2, \dots, X_n$ be a random sample taken from $N(\mu, \sigma^2)$.

Q1. What can be a point estimator of $\mu$?

A1. sample mean, $\bar{X} = \dfrac{X_1 + \cdots + X_n}{n}$.


Q2. How about a point estimator of $\sigma^2$?

A2. sample variance, $\displaystyle S^2 = \dfrac{1}{n-1} \sum^n_i (X_i - \bar{X})^2$ where $E[S^2] = \sigma^2$

or $\displaystyle \hat{S}^2 = \dfrac{1}{n} \sum^n_i (X_i - \bar{X})^2$ where $E[\hat{S}^2] = \dfrac{n-1}{n} \sigma^2$.

Q3. ๋‘ estimator ์ค‘ ์–ด๋–ค ๊ฒƒ์ด ๋” ์ข‹์€๊ฐ€?

A3. ๋‘ estimator์˜ <bias>๋ฅผ ๋น„๊ตํ•œ๋‹ค!

Unbiased Estimator

Definition. unbiased estimator ๐Ÿ”ฅ

A statistic $\hat{\Theta}$ is called an <unbiased estimator> if

\[E[\hat{\Theta}] = \theta \quad \text{for all} \quad \theta\]

์ฆ‰, <Estimator>์— ํ‰๊ท ์„ ์ทจํ–ˆ์„ ๋•Œ, population parameter $\theta$๊ฐ€ ์œ ๋„๋˜๋Š” estimator๋ฅผ ๋งํ•œ๋‹ค!

$E[\hat{\Theta} - \theta]$ is the โ€œbiasโ€ of $\hat{\Theta}$ related to $\theta$.

๐Ÿ’ฅ $E[\hat{\Theta} - \theta] = 0$, unbiased!

Example.

Let $X_1, X_2, \dots, X_n$ be a random sample taken from $N(\mu, \sigma^2)$.

Then, $\bar{X}$ is an unbiased estimator of $\mu$, and $S^2$ is an unbiased estimator of $\mu$.

Note that $E \left[ \frac{2X_1 + 0.5 X_2 + 0.5 X_3 + \cdots + X_n}{n}\right] = \mu$, so that one is also an unbiased estimator!

(Generalization) Letโ€™s consider a weigted average $\displaystyle\bar{X}_w = \sum^n_i w_i X_i$. This estimator is also an unbiased estimator.

\[E\left[ \bar{X}_w \right] = \sum^n_i w_i E[X_i] = \cancelto{1}{\left( \sum^n_i w_i \right)} \mu = \mu\]

Q. Why we use $\bar{X}$ instead of $\bar{X}_w$ for an estimator of $\mu$?

A. Because the โ€œvarianceโ€ of $\bar{X}$ is less than $\bar{X}_w$!

\[\text{Var}(\bar{X}) = E \left[ (\bar{X} - \mu)^2 \right] = \frac{\sigma^2}{n} \le \text{Var}(\bar{X}_w)\]

Variance of Estimator

Definition. variance of estimator ๐Ÿ”ฅ

For an estimator $\hat{\Theta}$, the variance of estimator is

\[\text{Var}(\hat{\Theta}) = E \left[ (\hat{\Theta} - E[\hat{\Theta}])^2 \right]\]

* Variance์˜ ์ •์˜๋ฅผ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฅธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ $\hat{\Theta}$๊ฐ€ statistic, ์ฆ‰ function of random samples $\hat{\Theta} = f(X_1, โ€ฆ, X_n)$์ด๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ ๊ณ„์‚ฐ์€ random sample์˜ distribution $X_i \sim g(x; \mu, \sigma)$๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋œ๋‹ค. $\text{Var}(\hat{\Theta}) = \text{Var}(g(X_1, โ€ฆ, X_n))$


Claim.

Among all weighted averages $\{ \bar{X}_w : w = (w_1, \dots, w_n), \sum w_i = 1\}$, $\bar{X}$ has the smallest variance.

We know that $\displaystyle\text{Var}(\bar{X}) = \frac{\sigma^2}{n}$.

\[\begin{aligned} \text{Var}(\bar{X}_w) &= \text{Var}\left( \sum^n_i w_i X_i \right) \\ &= \sum^n_i w_i^2 \cdot \text{Var}(X_i) \\ &= \sigma^2 \cdot \sum^n_i w_i^2 \end{aligned}\]

For $\sum w_i = 1$,

\[0 \le \sum^n_i \left(w_i - \frac{1}{n}\right)^2 = \sum w_i^2 - \frac{2}{n} \sum w_i + n \cdot \frac{1}{n^2} = \sum w_i^2 - \frac{1}{n}\]

๋”ฐ๋ผ์„œ,

\[\text{Var}(\bar{X}) = \frac{\sigma^2}{n} \le \sigma^2 \cdot \sum^n_i w_i^2 = \text{Var}(\bar{X}_w)\]

$\blacksquare$

The Most Efficient Estimator

โ€œbiasโ€์™€ โ€œvarianceโ€๋ฅผ ์ข…ํ•ฉํ•ด ์–ด๋–ค estimator๊ฐ€ ์ข‹์€ estimator์ธ์ง€ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

Definition. the most efficient estimator of $\theta$ ๐Ÿ”ฅ

Among all unbiased estimators of parameter $\theta$, the one with the smallest variance is called <the most efficient estimator of $\theta$>.


Remark.

When $X_i$โ€™s are iid $N(\mu, \sigma^2)$, it is known that $\bar{X}$ is the most efficient estimator of $\mu$.


Q. ์™œ most efficient estimator๋Š” unbiased estimator ์ค‘์—์„œ ๊ณ ๋ฅด๋Š” ๊ฑธ๊นŒ? biased estimator ์ค‘์—์„œ variance๊ฐ€ ๊ฐ€์žฅ ์ž‘์€๊ฒŒ ์žˆ์„ ์ˆ˜๋„ ์žˆ์ง€ ์•Š์„๊นŒ?

A. Yes, it is possible that a biased estimator can have smaller variance than an unbiased estimator.


Exercise.

Let $X_1, \dots, X_n$ be iid $N(\mu, \sigma^2)$.

Let $\displaystyle S^2 := \frac{1}{n-1} \sum^n_i (X_i - \bar{X})^2$ and $\displaystyle \hat{S}^2 := \frac{1}{n} \sum^n_i (X_i - \bar{X})^2$

Show that $\text{Var}(S^2) > \text{Var}(\hat{S}^2)$.

(Homework๐ŸŽˆ)

Mean Squared Error

<MSE; Mean Squared Error>๋ฅผ Point Estimator์˜ ํ‰๊ฐ€ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค!

Definition. MSE; Mean Squared Error ๐Ÿ”ฅ

The <MSE; Mean Squared Error> of an estimator is defined as

\[\text{MSE} := E \left[ \left( \hat{\Theta} - \theta \right)^2 \right]\]

Claim.

\[\text{MSE} := E \left[ \left( \hat{\Theta} - \theta \right)^2 \right] = \text{Var}(\hat{\Theta}) + \left[ \text{Bias} \right]^2\]

where $\text{Bias} := E \left[ \hat{\Theta} - \theta \right]$.

Proof.

(Homework๐ŸŽˆ) / (Solution)

์ผ๋‹จ ์œ„์˜ ๋ช…์ œ๋Š” ์ฐธ์ด๋ผ๊ณ  ๋ฐ›์•„๋“ค์ด๊ณ , ์ด ๋ช…์ œ๊ฐ€ ์™œ ์ค‘์š”ํ•œ์ง€๋ฅผ ์„ค๋ช…ํ•ด๋ณด๊ฒ ๋‹ค.

Estimator $\hat{\Theta}$๊ฐ€ statistic์ด๋ผ๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•˜๋Š”๊ฐ€? Estimator $\hat{\Theta}$๋Š” random sample $X_i$์˜ ํ•จ์ˆ˜๋กœ ํ‘œํ˜„๋œ๋‹ค.

\[\hat{\Theta} = f(X_1, X_2, ..., X_n)\]

๊ทธ๋ž˜์„œ ์ด $\hat{\Theta}$์˜ mean, variance๋Š” ๋ชจ๋‘ random sample $X_i$์˜ ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•ด ์•„์ฃผ ์‰ฝ๊ฒŒ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, unbiased estimator ๋ฌธ๋‹จ์—์„œ ๋“ค์—ˆ๋˜ sample mean $\bar{X}$์˜ ์‚ฌ๋ก€๋ฅผ ๋‹ค์‹œ ๋ณด๋ฉดโ€ฆ

Let random sample $X_i$ is taken from $N(\mu, \sigma^2)$. Then, the $E(\bar{X})$ is

\[E(\bar{X}) = E \left( \frac{\sum^n_i X_i}{n} \right) = \frac{1}{n} \sum^n_i E[X_i] = \frac{1}{n} \cdot n \mu = \mu\]

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Estimator $\hat{\Theta}$์˜ ๋ถ„์‚ฐ๋„ random sample์˜ ๋ถ„ํฌ๋ฅผ ์ด์šฉํ•ด ์‰ฝ๊ฒŒ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ, Estimator์˜ MSE๋Š” ๊ทธ๋ ‡์ง€ ์•Š๋‹ค. mean๊ณผ variance์™€๋Š” ๋‹ฌ๋ฆฌ random sample์˜ ๋ถ„ํฌ์—์„œ ์œ ๋„ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด straight ํ•˜๊ฒŒ ๋– ์˜ค๋ฅด์ง€ ์•Š์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๋ž˜์„œ ์œ„์˜ โ€œMSE๋Š” Estimator์˜ ๋ถ„์‚ฐ๊ณผ bias์˜ ์ œ๊ณฑ์˜ ํ•ฉ์ด๋‹คโ€๋ผ๋Š” ๋ช…์ œ๋ฅผ ํ™œ์šฉํ•ด Estimator์˜ MSE๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ํ›จ์”ฌํ›จ์”ฌ ์‰ฝ๋‹ค.

๋งŒ์•ฝ ์ด๋Ÿฐ ๋ฐฐ๊ฒฝ์„ ๋ชจ๋ฅด๊ณ , MSE๋ฅผ ๋งˆ์ฃผํ•œ๋‹ค๋ฉด ๊ฝค ํ˜ผ๋ž€์Šค๋Ÿฝ๋‹ค. ๋ณธ์ธ์€ ๋จธ์‹  ๋Ÿฌ๋‹์ด๋‚˜ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ํ•˜๋ฉด์„œ ๋ชจ๋ธ์˜ MSE๋ฅผ ๋จผ์ € ์ ‘ํ–ˆ๋Š”๋ฐ, Estimator์˜ MSE๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๊ฝค ๋œฌ๊ธˆ์—†๋‹ค๊ณ  ๋Š๊ผˆ๋‹ค. ๋ชจ๋ธ์˜ MSE๋Š” 300.5์™€ ๊ฐ™์ด ๊ฐ’์œผ๋กœ ์–ป์–ด์ง„๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Estimator์˜ MSE๋Š” ๋ฐฐ๊ฒฝ์„ ์ดํ•ดํ•˜๊ณ  ์œ„์˜ ๋ช…์ œ๋ฅผ ๋ฐ›์•„ ๋“ค์—ฌ์•ผ ํ•œ๋‹ค.


์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ๋Š” ๋˜๋‹ค๋ฅธ estimation ๋ฐฉ์‹์ธ <Interval Estimation>์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ฒ ๋‹ค. ์ด๋•Œ, ์ฃผ์–ด์ง„ Interval์ด ์–ผ๋งˆ๋‚˜ ์ข‹์€์ง€ ์•Œ๋ ค์ฃผ๋Š” ์ง€ํ‘œ๊ฐ€ ๋ฐ”๋กœ <confidence level> $1 - \alpha$๋‹ค!

๐Ÿ‘‰ Interval Estimation

ํฌ์ŠคํŠธ์— ์ œ์‹œ ๋˜์—ˆ๋˜ HW ๋ฌธ์ œ๋“ค์€ ์•„๋ž˜์˜ ํฌ์ŠคํŠธ์— ๋ณ„๋„๋กœ ์ •๋ฆฌํ•ด๋‘์—ˆ๋‹ค.

๐Ÿ‘‰ Statistics - PS1


  1. <statistic; ํ†ต๊ณ„๋Ÿ‰>์€ random samples $X_1, โ€ฆ, X_n$์˜ ํ•จ์ˆ˜ $f(X_1, โ€ฆ, X_n)$์„ ๋งํ•œ๋‹ค. Sampling Distribution ํฌ์ŠคํŠธ ์ฐธ๊ณ ย