โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

6 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Interval Estimation ํฌ์ŠคํŠธ์—์„œ ๋‹ค๋ฃฌ <Interval Estimation>์„ ํŠน์ • ์ƒํ™ฉ์— ์–ด๋–ป๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

Prediction Interval

Supp. the data points $x_1, x_2, \dots, x_n$ are drawn from $N(\mu, \sigma^2)$ with known $\sigma^2$. Now, we draw one more data point $x_0$. Can we estimate where this new data point $x_0$ can be?


Q. Find a confidence interval of the new observation $x_0$ by using data points $x_1, \dots, x_n$.

(๊ฐ€์ •) Here, assume $X_1, \dots, X_n$ follow iid normal $N(\mu, \sigma^2)$, and the new observation $X_0 \sim N(\mu, \sigma^2)$ and $X_0 \perp X_i$.

๋จผ์ €, ์šฐ๋ฆฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๋ถ„ํฌ๋ฅผ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

\[(X_0 - \bar{X}) \; \sim \; N \left(0, \; \sigma^2 + \frac{\sigma^2}{n} \right)\]

์œ„์˜ ๋ถ„ํฌ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ Confidence Interval์„ ๊ตฌํ•˜๋ฉด,

\[\begin{aligned} 1 - \alpha &= P \left(-z_{\alpha/2} \le \frac{X_0 - \bar{x}}{\sqrt{\sigma^2 + \frac{\sigma^2}{n}}} \le z_{\alpha/2} \right) \\ &= P \left(\bar{x} - z_{\alpha/2} \cdot \sqrt{\sigma^2 + \frac{\sigma^2}{n}} \le X_0 \le \bar{x} + z_{\alpha/2} \cdot \sqrt{\sigma^2 + \frac{\sigma^2}{n}} \right) \end{aligned}\]

๐Ÿ’ฅ ๋งŒ์•ฝ $\sigma^2$์„ ๋ชจ๋ฅธ๋‹ค๋ฉด, ์œ„์˜ ์‹์—์„œ $z_{\alpha/2}$ ๋ถ€๋ถ„์„ $t_{\alpha/2}$๋กœ ๋ฐ”๊ฟ”์ฃผ๋ฉด ๋œ๋‹ค!!


Tolerance Interval

<Prediction Interval>์—์„œ๋Š” โ€œthe next observationโ€์ด๋ผ๋Š” single observation์— ๊ด€์‹ฌ์„ ๊ฐ€์กŒ๋‹ค. ๋ฐ˜๋ฉด์—, ๋•Œ๋กœ๋Š” population์˜ ๊ฐ’์„ ์–ผ๋งˆ๋‚˜ ์ปค๋ฒ„ํ•˜๋Š”์ง€ ๊ทธ bound๋ฅผ ๊ตฌํ•ด์•ผ ํ•  ๋•Œ๋„ ์žˆ๋‹ค. <Tolerance Interval> ๋˜๋Š” <Tolerance Limits>๋Š” ์ด๋Ÿฐ bound๋ฅผ estimationํ•˜๋Š” ๊ณผ์ •์„ ๋งํ•œ๋‹ค!

Now, our interest is the proportion of the distribution where is the large bulk of our distribution.

Q. Let $X \sim N(\mu, \sigma^2)$, can you find interval which contains 95% of the population distribution?

[$\mu$ and $\sigma^2$ both are known]

\[\mu \pm 1.96 \sigma\]

์œ„์˜ ๋ฒ”์œ„๋Š” โ€˜์ •ํ™•ํžˆโ€™ population distribution์˜ 95%๋ฅผ ์ปค๋ฒ„ํ•œ๋‹ค! ์šฐ๋ฆฌ๋Š” ์ด๊ฒƒ์„ <Tolerance Interval>์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค!!

[$\mu$ and $\sigma^2$ both are unknown]

๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ๋‘ ํŒŒ๋ผ๋ฏธํ„ฐ $\mu$, $\sigma^2$์— ๋Œ€ํ•œ ๋ชจ๋ฅธ๋‹ค. ์ด ๊ฒฝ์šฐ, ์šฐ๋ฆฌ๋Š” sample mean $\bar{x}$, sample variance $s^2$๋ฅผ ์‚ฌ์šฉํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด interval์„ ๋งŒ๋“ค ์ˆ˜ ๋ฐ–์— ์—†๋‹ค.

\[\bar{x} \pm k s\]

์œ„์˜ interval์„ ๊ตฌ์„ฑํ•˜๋Š” ๋…€์„์ด ๋ชจ๋‘ RV์ด๊ธฐ ๋•Œ๋ฌธ์—, ์œ„์˜ interval ์—ญ์‹œ RV์ด๋ฉฐ population distribution์„ ์ปค๋ฒ„ํ•˜๋Š” ๋น„์œจ(proportion) ์—ญ์‹œ ์ •ํ™•ํžˆ ๊ฒฐ์ •๋˜์ง€ ์•Š๋Š”๋‹ค. ์šฐ๋ฆฌ์˜ ๋ชฉํ‘œ๋Š” ์œ„์˜ sample parameter์—์„œ ์ถ”์ •ํ•œ ์œ„์˜ interval์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, $\bar{x} \pm k s$์„ <Tolerance Limits>๋ผ๊ณ  ํ•œ๋‹ค!


<Tolerance Limits>๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด 2๊ฐ€์ง€ ๊ฐ’์„ ๊ฒฐ์ •ํ•ด์•ผ ํ•œ๋‹ค.

1. population distribution์„ ์–ผ๋งˆ๋‚˜ coverํ•˜๋Š” interval์„ ์ถ”์ •ํ•  ๊ฒƒ์ธ์ง€: $1 - \alpha$

์ด๊ฒƒ์€ $\bar{x} \pm k s$๊ฐ€ ์–ผ๋งŒํผ์˜ distribution์„ ์ปค๋ฒ„ํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, $\alpha=0.05$๋ผ๋ฉด, ์šฐ๋ฆฌ๋Š” $\bar{x} \pm k s$๊ฐ€ population distribution์˜ 95%๋ฅผ ์ปค๋ฒ„ํ•œ๋‹ค๊ณ  ๋งํ•  ๊ฒƒ์ด๋‹ค.

2. interval์˜ ์‹ ๋ขฐ๋„: $1 - \gamma$

์ด๊ฒƒ์€ RV์ธ $\bar{x} \pm k s$์˜ ์‹ ๋ขฐ๋„๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ƒ˜ํ”Œ๋ง์„ ๋ฐ˜๋ณตํ•  ๋•Œ๋งˆ๋‹ค $\bar{x}$, $s^2$๋Š” ๋Š˜ ๋ณ€ํ•  ๊ฒƒ์ด๊ณ , ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•œ $\bar{x} \pm k s$๋Š” ์ •ํ™•ํ•œ ๊ฐ’์ด ์•„๋‹ˆ๋ผ RV์ด๋‹ค. ๊ทธ๋ž˜์„œ ์ œ์‹œํ•œ $\bar{x} \pm k s$๊ฐ€ ์–ผ๋งŒํผ์˜ ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ์ œ์‹œํ•ด์•ผ ํ•œ๋‹ค. ๋งŒ์•ฝ $\gamma=0.05$๋ผ๋ฉด, ์šฐ๋ฆฌ๋Š” $\bar{x} \pm k s$๊ฐ€ 95%์˜ ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ๋งํ•  ๊ฒƒ์ด๋‹ค.


์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•ด๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$L(X_1, \dots, X_n)$, $U(X_1, \dots, X_n)$๋ฅผ ๊ฐ๊ฐ <Tolerance Limits>์˜ ์–‘๋ bound๋ผ๊ณ  ํ•ด๋ณด์ž. ๋‘˜์€ RV์ด๋‹ค.

์šฐ๋ฆฌ๋Š” $L(X_1, \dots, X_n)$ and $U(X_1, \dots, X_n)$ s.t. $(L, U)$ contains $95\% = (1-\alpha)\%$ of population with $100(1-\gamma)\%$ confidence๋ผ๋Š” ๋‘ <statistics>๋ฅผ ์ถ”์ •ํ•ด์ค˜์•ผ ํ•œ๋‹ค!! ๐Ÿ˜ฒ

\[P \left( F(U) - F(L) \ge 0.95 \right) = 1 - \gamma\]

where $F$ is the CDF of $N(0, 1)$.


๊ฒฐ๊ตญ <Tolerance Limits>๋ฅผ ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•ด์•ผ ํ•  ๊ฒƒ์€ $k$์ด๋‹ค. ์ด ๊ฐ’์€ <Tolerance Table>์„ ํ†ตํ•ด ๊ตฌํ•˜๋ฉด ๋œ๋‹ค. ์•„๋ž˜๋Š” ํ…Œ์ด๋ธ”์˜ ์˜ˆ์‹œ๋‹ค.

๐Ÿ‘‰ Tolerance Table

๊ฐ’์€ ์•„๋ž˜์˜ 3๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌํ•˜๋ฉด ๋œ๋‹ค.

  • Confidence Level of interval: $1-\gamma$
  • Percent Coverage: $1-\alpha$
  • sample size: $n$

์˜ˆ์ œ๋ฅผ ํ†ตํ•ด <Confidence Interval>๊ณผ <Prediction Interval>, <Tolerance Interval>์™€ ๊ทธ ์ฐจ์ด๋ฅผ ์ตํ˜€๋ณด์ž.


์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ๋Š” โ€œ๋‘ ๊ฐ€์ง€ ์ƒ˜ํ”Œโ€์ด ์กด์žฌํ•˜๋Š” Two Samples ์ƒํ™ฉ์„ ๋‹ค๋ฃฌ๋‹ค. ์ฃผ๋กœ ๋‘ ์ƒ˜ํ”Œ์˜ ํ‰๊ท ์˜ ์ฐจ $(\mu_1 - \mu_2)$๋ฅผ ์ถ”์ •ํ•˜๊ฑฐ๋‚˜, ๋‘ ์ƒ˜ํ”Œ์˜ ๋ถ„์‚ฐ์˜ ๋น„์œจ $\sigma_1^2 / \sigma_2^2$์„ ์ถ”์ •ํ•œ๋‹ค.