โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

2 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Interval Estimation ํฌ์ŠคํŠธ์—์„œ ๋‹ค๋ฃฌ <Interval Estimation>์„ ํŠน์ • ์ƒํ™ฉ์— ์–ด๋–ป๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

์•„๋ž˜์˜ ๋ฌธ์ œ๋ฅผ ์‚ดํŽด๋ณด์ž!

์šฐ๋ฆฌ๋Š” ํ•™์ƒ1๋ถ€ํ„ฐ ํ•™์ƒ30๊นŒ์ง€ ๊ทธ๋“ค์˜ TOEIC ์ ์ˆ˜์˜ before-after๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ณผ์—ฐ MATH230 ์ˆ˜์—…์ด ํ•™์ƒ๋“ค์˜ TOEIC ์ˆ˜์—…์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ์•Œ๊ธฐ ์œ„ํ•ด $\mu_1 - \mu_2$๋ฅผ ์ถ”์ •ํ•˜๊ณ ์ž ํ•œ๋‹ค!!

Q. Can we find a 95% confidence interval for the true mean of the differences btw the scores before and after the MATH230?

Supp. $X_1, \dots, X_n$ and $Y_1, \dots, Y_n$ are random samples and $\sigma_1^2$ and $\sigma_2^2$ are known.

์ด์ „ ํฌ์ŠคํŠธ โ€œTwo Samples Estimation: Diff Btw Two Meansโ€œ์—์„œ ๋งŒ์•ฝ ๋‘ ์ƒ˜ํ”Œ์˜ ๋ถ„์‚ฐ์„ ์ •ํ™•ํžˆ ์•ˆ๋‹ค๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ตฌ๊ฐ„์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•˜์˜€๋‹ค.

\[\left| \bar{x} - \bar{y} \right| \le z_{\alpha/2} \cdot \sqrt{\frac{\sigma_1^2}{n} + \frac{\sigma_2^2}{n}}\]

๐Ÿ’ฅ ํ•˜!์ง€!๋งŒ! ์œ„์˜ ๋ฐฉ๋ฒ•์€ ์˜ฌ๋ฐ”๋ฅธ ์ ‘๊ทผ์ด ์•„๋‹ˆ๋‹ค! ์™œ๋ƒํ•˜๋ฉด, ํ˜„์žฌ ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง„ ์ƒ˜ํ”Œ $X_i$, $Y_i$์— ๋Œ€ํ•ด ๊ทธ ๋‘˜์ด ์„œ๋กœ dependent ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!! ์œ„์˜ ์ ‘๊ทผ์€ $X_i$์™€ $Y_i$๊ฐ€ independent ํ•  ๋•Œ๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค!!

๊ทธ๋ž˜์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ $X_i$, $Y_i$๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์ƒ๊ฐํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๊ทธ๋“ค์„ paringํ•œ Difference $D_i = X_i - Y_i$๋กœ ์ ‘๊ทผํ•˜๊ณ ์ž ํ•œ๋‹ค!

์ด๋ ‡๊ฒŒ ํ•  ๊ฒฝ์šฐ, ๊ฐ Pair๋Š” ์„œ๋กœ independentํ•˜๊ฒŒ ๋œ๋‹ค!

Assume that $D_1, \dots, D_n$ are normal random samples: $D_i \sim N(\mu_D, \sigma_D^2)$

To find the confidence interval for $\mu_1 - \mu_2$, we use $\bar{D} := \bar{X} - \bar{Y}$.

Then, by CLT

\[\frac{\bar{D} - \mu_D}{\sigma_D / \sqrt{n}} \; \sim \; N(0, 1)\]

์ด๋•Œ, ์šฐ๋ฆฌ๋Š” $\sigma_D^2$๋ฅผ ์•Œ์ง€ ๋ชปํ•˜๋ฏ€๋กœ ์ด๊ฒƒ์„ sample variance $s_D^2$์œผ๋กœ ๊ต์ฒดํ•˜๋ฉด ๋ถ„ํฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\frac{\bar{D} - \mu_D}{s_D / \sqrt{n}} \; \sim \; t(n-1)\]

์ง€๊ธˆ๊นŒ์ง€๋Š” <Normal Distribution>์—์„œ ๋ฝ‘์€ random sample์—์„œ ์ถ”์ •(Estimation)์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ๋‹ค์Œ ํฌ์ŠคํŠธ์—์„œ๋Š” <Bernoulli Distribution>์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” ์ถ”์ •์ธ <Proportion Estimation>์— ๋Œ€ํ•ด ์‚ดํŽด๋ณธ๋‹ค!! (Binomial Distribution์—์„œ์˜ ํ‰๊ท ์€ Proportion์ด๋‹ค!! ๐Ÿ˜)

๐Ÿ‘‰ Proportion Estimation on Bernoullid Distribution