โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

4 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Test on Proportion

Single Sample์—์„œ์˜ ๊ฒฝ์šฐ์™€ Two Sample์—์„œ์˜ ๊ฒฝ์šฐ๋ฅผ ๋ชจ๋‘ ๋‹ค๋ฃฌ๋‹ค.


Test on One Proportion

Consider a p-coin, and $p$ is unknown.

We want to test

  • $H_0: p=1/3$
  • $H_1: p>1/3$

We toss a coin $n$ times independently, and let $x$ be the #. of heads in theses $n$ trials.

Q1. What is the p-value?

A1. $H_1$์ด $p > 1/3$ ํ˜•ํƒœ์ด๋ฏ€๋กœ $x$๊ฐ€ ํŠน์ •๊ฐ’ $C$ ์ด์ƒ์ผ ๋•Œ, $H_0$๋ฅผ reject ํ•œ๋‹ค.

\[P( X \ge C \mid p = 1/3)\]

๊ทธ๋ž˜์„œ p-value๋Š” $C$ ์ž๋ฆฌ์— $x$๋ฅผ ๋Œ€์ž…ํ•ด p-value๋ฅผ ์œ ๋„ํ•˜๋ฉด ๋œ๋‹ค!

\[P(X \ge x \mid p = 1/3) = \text{p-value}\]

Q2. ๋งŒ์•ฝ $H_1: p < 1/3$ ํ˜•ํƒœ๋ผ๋ฉด?

A2. ์œ„์˜ p-value ์‹์—์„œ ๋ถ€ํ˜ธ๋งŒ ๋ฐ˜๋Œ€๋กœ ์ ์–ด์ฃผ๋ฉด ๋œ๋‹ค.

\[P(X \le x \mid p = 1/3)\]

Q3. ๋งŒ์•ฝ $H_1: p \ne 1/3$์˜ ํ˜•ํƒœ๋ผ๋ฉด? (two-sided test)

A3. $X \le C_1$์ด๊ฑฐ๋‚˜ $X \ge C_2$์ผ ๋•Œ, $H_0$๋ฅผ ๊ธฐ๊ฐํ•  ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ

\[P(X \le C_1 \;\; \text{or} \;\; X \ge C_2 \mid p = 1/3)\]

๊ทธ๋Ÿฐ๋ฐ ์‹คํ—˜์—์„œ ํ•˜๋‚˜์˜ $x$ ๊ฐ’๋งŒ์„ ์–ป์—ˆ๊ณ , ์œ„์˜ ๊ณผ์ •์— ๋”ฐ๋ฅด๋ฉด, ์ด ๊ฐ’์„ $C$์— ๋Œ€์ž…ํ–ˆ๋‹ค. ์ด๊ฒƒ์„ ์œ„ ์‹์— ์ ์šฉํ•˜๋ฉด,

\[P(X \le x \;\; \text{or} \;\; X \ge x \mid p = 1/3) = 1\]

๊ฐ€ ๋˜๋Š”๋ฐ, ์ด ๊ฐ’์€ 1์ด๋‹ค! ๐Ÿ˜ฒ ๋ณดํ†ต ํ•˜๋‚˜์˜ $x$ ๊ฐ’๋งŒ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, one-side test๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š”๊ฒŒ ํ•ฉ๋ฆฌ์ ์ด๋‹ค.

$X \le C_1$๊ณผ $X \ge C_2$ ์ค‘ ์–ด๋–ค ๋ฐฉํ–ฅ์„ ์ทจํ• ์ง€ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด, expected value $E[X]$๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์‚ผ์ž. ์ด ๊ฐ’์€ $np$์ด๋‹ค.

  • If $x < np$, take $X \le C_1$
  • If $x > np$, take $X \ge C_2$

๋งŒ์•ฝ $x < np$๋ผ๊ณ , ๊ฐ€์ •ํ•˜๊ณ  $X \le C_1$๋กœ p-value๋ฅผ ๊ตฌํ•ด์•ผ ํ•œ๋‹ค. p-value๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์–‘์ธก ๊ฒ€์ •์ด๊ธฐ ๋•Œ๋ฌธ์— $2$๋ฅผ ๊ณฑํ•ด์ค€๋‹ค.

\[2 \cdot P(X \le x \mid p = 1/3)\]

๋งŒ์•ฝ, $\alpha$ ๊ฐ’์ด p-value ๋ณด๋‹ค ํฌ๋‹ค๋ฉด, $H_0$๋ฅผ ๊ธฐ๊ฐํ•œ๋‹ค!


Test on Two Proportions

๋‘ ์ง‘ํ•ฉ์˜ ๋น„์œจ์ด ๋™์ผํ•œ์ง€, ์ฆ‰ $p_1 = p_2$๋ฅผ ๊ฒ€์ •ํ•˜๋Š” ๋ฌธ์ œ๋‹ค. <Proportion Estimation>์—์„œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ CLT๋ฅผ ์ ์šฉํ•ด Test Statistic์„ ๊ตฌํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\dfrac{p_1 q_1}{n_1} + \dfrac{p_2 q_2}{n_2}}} \sim N(0, 1)\]

๋งŒ์•ฝ โ€œ๋‘ ์ง‘๋‹จ์˜ proportion์ด ๋™์ผํ•˜๋‹คโ€๋Š” ๊ฐ€์ •์ด ์ฐธ์ด๋ผ๋ฉด, $p = p_1 = p_2$์ด๋ฏ€๋กœ ์‹์„ ๋‹ค์‹œ ์“ฐ๋ฉด,

\[\frac{(\hat{p}_1 - \hat{p}_2)}{\sqrt{pq (1/n_1 + 1/n_2)}}\]

๊ทธ๋Ÿฐ๋ฐ, ์šฐ๋ฆฌ๋Š” population proportion์ด $p_1 = p_2$ ๋ผ๋Š” ๊ฒƒ๋งŒ ์•Œ์ง€ $p_1$, $p_2$์˜ ๊ฐ’์„ ๋ชจ๋ฅธ๋‹ค. ๊ทธ๋ž˜์„œ, <Proportion Estimation>์—์„œ ํ•œ ๊ฒƒ์ฒ˜๋Ÿผ sample proportion $\hat{p}$์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค!

๊ทธ๋Ÿฐ๋ฐ sample proportion $\hat{p}_1$๊ณผ $\hat{p}_2$ ๋‘˜ ์ค‘ ๋ญ˜ ์จ์•ผํ• ๊นŒ? ๋‘˜์„ ์ข…ํ•ฉํ•œ pooled proportion $\hat{p}$์„ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค!

\[\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\]

์‹์„ ๋‹ค์‹œ ์“ฐ๋ฉด,

\[\frac{(\hat{p}_1 - \hat{p}_2)}{\sqrt{\hat{p}\hat{q} \left(1/n_1 + 1/n_2\right)}}\]

์œ„์˜ ๊ณต์‹์„ ํ†ตํ•ด p-value๋ฅผ ๊ตฌํ•˜๊ณ , p-value๊ฐ€ $\alpha$ ๊ฐ’๋ณด๋‹ค ์ž‘๋‹ค๋ฉด, $H_0$๋ฅผ ๊ธฐ๊ฐํ•œ๋‹ค!


๋งบ์Œ๋ง

์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ <proportion test>์„ ์ผ๋ฐ˜ํ™”ํ•œ <Chi-square Goodness-of-fit test>๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. <chi-square distribution> $\chi^2$๋ฅผ ์‚ฌ์šฉํ•ด ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ํ‘œ๋ณธ์˜ ๋…๋ฆฝ(independence)์™€ ๋™์งˆ์„ฑ(homogeneity)์— ๋Œ€ํ•œ ๊ฒ€์ •์„ ํ•  ์ˆ˜ ์žˆ๋‹ค!

๐Ÿ‘‰ Chi-square Goodness-of-fit test