์˜ ์ผ๋ฐ˜ํ™”. ์นด๋ ˆ๊ณ ๋ฆฌ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๊ฒ€์ •.

12 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

<Proportion Test>์˜ ๋‚ด์šฉ์„ ๋จผ์ € ์‚ดํŽด๋ณด๊ณ  ์˜ค๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•œ๋‹ค. <Proportion Test>๋ฅผ ์ผ๋ฐ˜ํ™”ํ•œ ๊ฒƒ์ด <Goodness-of-fit Test>์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค!

Introduction to Goodness-of-fit Test

<Goodness-of-fit Test; ์ ํ•ฉ๋„ ๊ฒ€์ •>์€ population distribution์ด categorical variable์„ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ, ์˜ˆ๋ฅผ ๋“ค์–ด Head-Tail์˜ ๋™์ „ ๋˜์ง€๊ธฐ, ์ฃผ์‚ฌ์œ„ ๋˜์ง€๊ธฐ ๋“ฑ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒ€์ • ๊ธฐ๋ฒ•์ด๋‹ค. <Goodness-of-fit Test>๋Š” ์นดํ…Œ๊ณ ๋ฆฌ ๋ณ€์ˆ˜์˜ Sample Distribution (๋˜๋Š” Observed Distribution)์ด ๊ฐ€์ •ํ•œ Expected Distribution๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

๋จผ์ € ์•„๋ž˜์˜ ์˜ˆ์ œ๋ฅผ ํ’€๋ฉด์„œ, <Goodness-of-fit Test; ์ ํ•ฉ๋„ ๊ฒ€์ •>์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด์ž.

1. ๊ฒ€์ •์˜ ๋ชฉํ‘œ

  • $H_0: p=0.8$
  • $H_1: p \ne 0.8$
  • significance level $\alpha$

2. ์ƒ˜ํ”Œ๋ง ์ƒํ™ฉ

- made missed total
observed 70 30 100
expected
under $H_0$
80 20 100

3. Test Statistic

์ด์ œ ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ <Test Statistic>์„ ๊ฒฐ์ •ํ•˜์ž. sample proportion $\hat{p}$๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

$\hat{p}$์— CLT๋ฅผ ์ ์šฉํ•˜๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\frac{\hat{p} - p}{\sqrt{p(1-p) / n}} \sim N(0, 1)\]

์ด์ „์˜ <Proportion Test>์—์„  ์ด๊ฑธ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค.

\[\text{reject} \; H_0, \quad \text{if} \quad \left| \frac{\hat{p} - p}{\sqrt{p(1-p) / n}} \right| > z_{\alpha/2}\]

chi-square test์—์„  z-value์— ์ œ๊ณฑ์„ ์ทจํ•œ๋‹ค.

\[\text{reject} \; H_0, \quad \text{if} \quad \left| \frac{\hat{p} - p}{\sqrt{p(1-p) / n}}\right|^2 > \left| z_{\alpha/2}\right|^2 = \chi^2_{\alpha}(1)\]


<Goodness-of-fit Test>๋ฅผ ์†Œ๊ฐœํ•  ๋•Œ๋„ ๋งํ–ˆ๋“ฏ์ด <Goodness-of-fit Test>๋Š” ์นดํ…Œ๊ณ ๋ฆฌ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๊ฒ€์ •์ด๋‹ค. ์œ„์˜ ์‹์€ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ 2๊ฐœ ๋ฟ์ธ ์ƒํ™ฉ์—์„œ๋งŒ ์„ฑ๋ฆฝํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์œ„์˜ ์‹์„ ์•ฝ๊ฐ„ ๋ณ€ํ˜•ํ•ด <GOF Test>์˜ ์‹์„ ์œ ๋„ํ•ด๋ณด์ž.

์ผ๋‹จ์€ 2๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ ์‹œ์ž‘ํ•ด๋ณด์ž.

\[\begin{aligned} \left| \frac{\hat{p} - p}{\sqrt{p(1-p) / n}}\right|^2 &= \frac{(\hat{p} - p)^2}{p(1-p)/n} \\ &= \frac{(x/n - p)^2}{p(1-p)/n} \\ &= \frac{(x/n - p)^2 \times n^2}{p(1-p)/n \times n^2} \\ &= \frac{(x - np)^2}{np(1-p)} \end{aligned}\]

$\dfrac{1}{y(1-y)} = \dfrac{1}{y} + \dfrac{1}{1-y}$์ž„์„ ์ด์šฉํ•ด ์‹์„ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ถ„ํ•ดํ•œ๋‹ค.

\[\begin{aligned} \frac{(x - np)^2}{np(1-p)} &= \frac{(x-np)^2}{np} + \frac{(x-np)^2}{n(1-p)} \end{aligned}\]

์ด๋•Œ, $np$๋Š” ์ฒซ๋ฒˆ์งธ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•œ expected value์ธ $e_1 = 80$์ด๊ณ , $n(1-p)$๋Š” ๋‘๋ฒˆ์งธ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•œ $e_2 = 20$์ด๋‹ค. ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ๋ถ„์ž์˜ $(x-np)^2$๋Š” โ€œobserved value์™€ expected value์˜ ์ฐจ์ด ๊ฐ’โ€์ด๋‹ค.

\[(x-np)^2 = (o_1 - e_1)^2\]

๊ทธ๋Ÿฐ๋ฐ $(x-np)^2$๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ํ‘œํ˜„ํ•˜๋ฉด, ๋‘๋ฒˆ์งธ observed value์™€ expected value์˜ ์ฐจ์ด ๊ฐ’์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜๋„ ์žˆ๋‹ค!

\[(x-np)^2 = \left( (x-n) + (n-np) \right)^2 = (o_2 - e_2)^2\]


์‹์„ ์ข…ํ•ฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๊ณ ,

\[\left| \frac{\hat{p} - p}{\sqrt{p(1-p) / n}}\right|^2 = \frac{(o_1 - e_1)^2}{e_1} + \frac{(o_2 - e_2)^2}{e_2}\]

rejection criterion์„ ๋‹ค์‹œ ์“ฐ๋ฉด,

\[\text{reject} \; H_0, \quad \text{if} \quad \sum_{i=1}^2 \frac{(o_i - e_i)^2}{e_i} > \chi^2_{\alpha}(1)\]


2๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ ์˜ˆ์ œ๋ฅผ $k$๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ์ผ๋ฐ˜ํ™” ํ•˜์—ฌ ๊ธฐ์ˆ ํ•ด๋ณด์ž.

Definition. Test Statistic for Goodness-of-fit

<Goodness-of-fit>์˜ Test Statistic์€

\[\chi^2 := \sum_{i=1}^k \frac{(o_i - e_i)^2}{e_i}\]

where $o_i$ and $e_i$ are the observed and expected occurrences respectively.

๐Ÿ’ฅ NOTE: all expected occurrences must be at least 5. ๋งŒ์•ฝ, 5 ์ดํ•˜์˜ ๋นˆ๋„๋ฅผ ๊ฐ€์ง€๋Š” ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์žˆ๋‹ค๋ฉด, ๊ทธ๊ฒƒ์„ ๋‹ค๋ฅธ ์นดํ…Œ๊ณ ๋ฆฌ์— ํ•ฉ์น˜๋Š” pooling์„ ์ˆ˜ํ–‰ํ•˜๋ผ!

์œ„์˜ ์˜ˆ์ œ์—์„œ๋Š” ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ๋‹จ 2๊ฐœ์ธ ์ƒํ™ฉ์ด์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ฃผ์‚ฌ์œ„ ๊ตด๋ฆฌ๊ธฐ์™€ ๊ฐ™์ด ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์ธ ๊ฒฝ์šฐ๋Š” $\chi^2$ ๋ถ„ํฌ์˜ DOF๊ฐ€ ๋‹ฌ๋ผ์ง„๋‹ค. ๊ทธ ๊ณต์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Definition. Degree of Freedom for Goodness-of-fit

The degree of freedom $\nu$ = (#. of categories after pooling - 1) - #. of parameters estimated

(#. of categories)์—์„œ $-1$์„ ํ•˜๋Š” ์ด์œ ๋Š” Total value $n$์ด ์ฃผ์–ด์กŒ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋งˆ์ง€๋ง‰ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ฐ’์€ Deterministicํ•˜๊ฒŒ ๊ฒฐ์ •๋œ๋‹ค!

ํ†ต๊ณ„ํ•™์—์„œ์˜ DOF์— ๋Œ€ํ•ด ๊ถ๊ธˆํ•˜๋‹ค๋ฉด, ์•„๋ž˜์˜ ํฌ์ŠคํŠธ๋ฅผ ์ฝ์–ด๋ณด๊ณ  ์˜ค์ž!

๐Ÿ‘‰ Degree of Freedom in Statistics


Test for Independence

<Chi-squared goodness-of-fit Test>๋ฅผ ์‘์šฉํ•ด ๋‘ ๊ฐœ์˜ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ(independent)์ธ์ง€ ๊ฒ€์ •ํ•ด๋ณด์ž.

โ€˜incomeโ€™๊ณผ โ€˜politicalโ€™์ด ์„œ๋กœ ๋…๋ฆฝ์ธ์ง€๋ฅผ ๊ฒ€์ •ํ•ด๋ณด์ž. ์•„๋ž˜์™€ ๊ฐ™์ด $H_0$์™€ $H_1$์„ ์„ค์ •ํ•œ๋‹ค.

  • $H_0$: income-political is independent
  • $H_1$: they are not independent

$H_0$๋ฅผ ์ˆ˜์‹์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[P(\text{party } 1 \; \And \; \text{low}) = P(\text{part } 1) \cdot P(\text{low})\]

๋‘ ์นดํ…Œ๊ณ ๋ฆฌ๊ฐ€ ๋…๋ฆฝ์ด๋ผ๋Š” ๊ฐ€์ • $H_0$์—์„œ ์œ ๋ž˜ํ•œ ์œ„์˜ ๊ณต์‹์„ ํ™œ์šฉํ•˜๋ฉด, ๊ฐ ์ƒํ™ฉ์˜ expected value $e_{ij}$๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด $e_{11}$์€

\[\begin{aligned} e_{11} &= 1500 \times P(\text{P1} \; \And \; \text{Low}) \\ &= 1500 \times \frac{677}{1500} \times \frac{499}{1500} \\ &= \frac{677 \cdot 499}{1500} = 225.21 \end{aligned}\]

์ด๋Ÿฐ ๋ฐฉ์‹์œผ๋กœ ๊ฐ entry์— ๋Œ€ํ•œ expected value $e_{ij}$๋ฅผ ๊ตฌํ•œ๋‹ค.

๋‹ค์Œ์œผ๋ก  chi-square test์˜ ๊ณต์‹์— $o_{ij}$, $e_{ij}$๋ฅผ ๋Œ€์ž…ํ•ด $\chi^2$-value๋ฅผ ๊ตฌํ•œ๋‹ค.

\[\chi^2 = \sum_{i=1}^3 \sum_{j=1}^3 \frac{(o_{ij} - e_{ij})^2}{e_{ij}}\]

$\chi^2$ ๋ถ„ํฌ์˜ DOF๋„ ๊ตฌํ•ด๋ณด๋ฉด,

\[\begin{aligned} \nu &= (9-1) - \left((3-1) + (3-1)\right) \\ &= 8 - (2 + 2) = 4 \end{aligned}\]

์ด๋•Œ โ€œ(#. of parameters estimated) = $4$โ€๊ฐ€ ๋˜๋Š” ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์šฐ๋ฆฌ๊ฐ€ โ€˜partyโ€™์— ๋Œ€ํ•œ parameter๋ฅผ ๊ตฌํ•˜๋ ค๋ฉด, ์„ธ ๊ฐ€์ง€ ๊ฒฝ์šฐ์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๊ตฌํ•ด์•ผ ํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ํ™•๋ฅ ์˜ ๊ฒฝ์šฐ ๅˆ์ด 1์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์„ธ๊ฐ€์ง€ ๊ฒฝ์šฐ ์ค‘ ๋‘ ๊ฐ€์ง€ ๊ฒฝ์šฐ๋งŒ ๊ตฌํ•˜๋ฉด ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, โ€˜partyโ€™์— ๋Œ€ํ•ด์„œ ๋‘ ๊ฐ€์ง€ parameter๋ฅผ estimate ํ•ด์•ผ ํ•˜๊ณ , ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ โ€˜incomeโ€™์— ๋Œ€ํ•ด์„œ๋„ ๋‘ ๊ฐ€์ง€ parameter๋ฅผ estimate ํ•ด์•ผ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ, (#. of parameters estimated)๋Š” 4๊ฐœ์ด๋‹ค.


์ด๊ฒƒ์„ ๊ณต์‹์œผ๋กœ ์ž‘์„ฑํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \nu &= r \cdot c - 1 - \left((r -1) + (c-1)\right) \\ &= r(c-1) - (c-1) \\ &= (r-1)(c-1) \end{aligned}\]

$\chi^2$-value์™€ DOF $\nu$๋ฅผ ๊ตฌํ–ˆ์œผ๋ฉด ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋œ๋‹ค.

Reject $H_0$, if $\chi^2 > \chi^2_{\alpha} (\nu)$.


Test for Homogeneity

์ด๋ฒˆ์—๋Š” <Goodness-of-fit Test>๋ฅผ ์‘์šฉํ•ด ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ์—์„œ์˜ ๋ถ„ํฌ๊ฐ€ ๊ท ์ผ(homogeneous)ํ•œ์ง€ ๊ฒ€์ •ํ•ด๋ณด์ž. ์˜ˆ๋ฅผ ๋“ค๋ฉด, โ€œ์ธ์ข… ๋ณ„๋กœ ํก์—ฐ์ž์™€ ๋น„ํก์—ฐ์ž ๋น„์œจ์ด ๋™์ผํ•œ๊ฐ€?โ€์™€ ๊ฐ™์€ ์งˆ๋ฌธ์„ ๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๋จผ์ € ๋ฌด์—‡์„ ๊ฒ€์ •ํ•˜๊ณ ์ž ํ•˜๋Š”์ง€ ๋ช…ํ™•ํžˆ ์ •์˜ํ•ด๋ณด์ž.

โ€œIs the party preference homogeneous among various regions?โ€

์ด๊ฒƒ์„ ํ™•์ธํ•˜๋ ค๋ฉด, โ€˜part $i$โ€™์„ ์„ ํ˜ธํ•˜๋Š” ๋น„์œจ์ด ๊ฐ ์ง€์—ญ๋งˆ๋‹ค ๋ชจ๋‘ ๋™์ผํ•œ์ง€ ํ™•์ธํ•ด์•ผ ํ•œ๋‹ค. ์ด๊ฒƒ์€ ์•„๋ž˜์˜ ๋“ฑ์‹ ์„ฑ๋ฆฝํ•จ์„ ๋งํ•œ๋‹ค.

\[P(\text{party } i \mid \text{Seoul}) = P(\text{part } i \mid \text{Daejeon}) = P(\text{party } i \mid \text{Gwangju}) = P(\text{party } i \mid \text{Daegu})\]

์ด ๋“ฑ์‹์„ null hypothesis $H_0$๋กœ ์‚ผ์•„ ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜์ž!

์œ„์˜ ํ‘œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ $e_{11}$๋ฅผ ๊ตฌํ•ด๋ณด์ž. ๋จผ์ € โ€˜Seoulโ€™์˜ ์ด ์ธ๊ตฌ๋Š” 500์ด๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ „์ฒด ์‚ฌ๋žŒ ์ˆ˜ ์ค‘ โ€˜party 1โ€™์„ ์„ ํ˜ธํ•˜๋Š” ์‚ฌ๋žŒ์˜ ๋น„์œจ์€ 391/1000์ด๋‹ค. ๋”ฐ๋ผ์„œ, $e_{11}$์€

\[e_{11} = 500 \times \frac{391}{1000}\]

๋งˆ์ฐฌ๊ฐ€์ง€๋กœ $e_{12}$์˜ ๊ฒฝ์šฐ๋Š” $e_{12} = 100 \times 391 / 1000$๋กœ, $e_{21}$์€ $e_{21} = 500 \times 537 / 1000$์ด๋‹ค.


โœจ Homogeneity Test is Equivalent to Independence Test โœจ

์‚ฌ์‹ค Homogeneity Test๋Š” ์•ž์—์„œ ์ˆ˜ํ–‰ํ•œ Independence Test์™€ ๋™์น˜์ด๋‹ค. Homogeneity Test์˜ $H_0$๊ฐ€ Inpendence๋ฅผ ์ง์ ‘์ ์œผ๋กœ ํ‘œํ˜„ํ•˜์ง„ ์•Š์•˜์ง€๋งŒ, ์•ฝ๊ฐ„ ๋ณ€ํ˜•ํ•˜๋ฉด Independence๋กœ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํŽธ์˜๋ฅผ ์œ„ํ•ด $\text{party } i = B_i$, $\text{region } j = A_j$๋กœ ํ‘œ์‹œํ•˜๊ฒ ๋‹ค.

\[\begin{aligned} P(B_i \mid A_1) &= P(B_i \mid A_2) = x \\ \frac{P(B_i \cap A_1)}{P(A_1)} &= \frac{P(B_i \cap A_2)}{P(A_2)} = x \end{aligned}\]

์ขŒ๋ณ€์˜ ๋ถ„๋ชจ๋ฅผ ์šฐ๋ณ€์œผ๋กœ ๋„˜๊ธฐ๋ฉด,

\[P(B_i \cap A_j) = x P(A_j)\]

๊ฐ€ ๋˜๋Š”๋ฐ, ์ด $P(B_i \cap A_j)$๋ฅผ ์ „๋ถ€ ๋ชจ์œผ๋ฉด โ€œLaw of Total Probabilityโ€์— ์˜ํ•ด

\[P(B_i) = \sum_{j=1}^4 P(B_i \cap A_j) = x \cdot \cancelto{1}{\sum_{j=1}^4 P(A_j)} = x\]

์ฆ‰, $x = P(B_i)$์ด๋‹ค. ์ด๊ฑธ ์ฒ˜์Œ์˜ ์ˆ˜์‹์— ๋Œ€์ž…ํ•˜๋ฉด,

\[P(B_i \mid A_1) = x = P(B_i)\]

์ด๊ฒƒ์€ $B_i$์™€ $A_j$๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ž„์„ ์˜๋ฏธํ•œ๋‹ค!!! $\blacksquare$

์œ„์˜ ์ฆ๋ช…์„ ํ†ตํ•ด <Homogeneity Test>๊ฐ€ <Independence Test>์™€ ๋™์น˜์ž„์„ ํ™•์ธํ–ˆ๋‹ค. ๊ทธ๋ž˜์„œ <Independence Test>์—์„œ ์ผ๋˜ ๊ฒ€์ • ๋ฐฉ์‹์„ ๊ทธ๋Œ€๋กœ ์“ฐ๋ฉด ๋œ๋‹ค!!

DOF๋„ <Independence Test>์˜ ๊ณต์‹์œผ๋กœ ๊ตฌํ•ด๋ณด๋ฉด,

\[\nu = (r-1) (c-1) = (3 - 1) (4 - 1) = 6\]

๊ทธ๋ฆฌ๊ณ  ๊ฒ€์ •์„ ์ˆ˜ํ–‰ํ•˜๋ฉด,

Reject $H_0$, if $\chi^2 > \chi^2_{\alpha}(\nu)$


Proportion Test and Chi-square Test

<chi-square test>๊ฐ€ โ€œ<proportion test>์˜ ์ผ๋ฐ˜ํ™”โ€๋ผ๋Š” ๊ฑธ ์‹ค์ œ ๊ฐ’๊ณผ ํ•จ๊ป˜ ๋‹ค๋ค„๋ณด๊ณ ์ž ํ•œ๋‹ค.

One Proportion Case

์•ž๋ฉด์˜ ํ™•๋ฅ ์ด $p$์ธ p-coin์ด ์žˆ๋‹ค. ์•„๋ž˜์˜ ๊ฐ€์„ค์„ ๊ฒ€์ •ํ•˜๊ณ ์ž ํ•œ๋‹ค.

  • $H_0$: $p = 1/3$
  • $H_1$: $p \ne 1/3$

$20$๋ฒˆ์˜ ์‹คํ—˜์œผ๋กœ ์–ป์€ sample proportion์€ $\hat{p} = 1/4$์˜€๋‹ค.

One Proportion Test์˜ Statistic์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\frac{p - \hat{p}}{\sqrt{p(1-p) / n}}\]

์ด๊ฒƒ์— ๋Œ€์ž…ํ•ด z-value๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด, $z = 0.791$์ด๋‹ค. Alternative Hypothesis $H_1$ ์ด ์–‘์ธก ๊ฒ€์ •์˜ ํ˜•ํƒœ์ด๋ฏ€๋กœ p-value๋ฅผ ๊ตฌํ•˜๋ฉด, $0.428$์ด๋‹ค.


์ด๋ฒˆ์—๋Š” chi-square GOF test๋ฅผ ํ•ด๋ณด์ž. Test Statistic์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\sum^2_{i=1} \frac{(o_i - e_i)^2}{e_i}\]

์ด๊ฒƒ์— ๋Œ€์ž…ํ•ด $\chi^2$-value๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด, $\chi^2 = 0.625$์ด๋‹ค. DOF $\nu = 1$์ด๋ฏ€๋กœ p-value๋ฅผ ๊ตฌํ•˜๋ฉด, $0.429$์ด๋‹ค.

์™€์šฐ! ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ ๋ชจ๋‘ ๋™์ผํ•œ p-value๋ฅผ ์–ป์—ˆ๋‹ค!!

Two Proportion Case

๋‘ ์ง‘ํ•ฉ์˜ ๋น„์œจ์ด ๋™์ผํ•œ์ง€, $p_1 = p_2$์ธ์ง€๋ฅผ ๊ฒ€์ •ํ•˜๊ณ ์ž ํ•œ๋‹ค. Test Statistic์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}\hat{q}(1/n_1 + 1/n_2)}}\]

์ด๋•Œ, $\hat{p}$์€ pooled proportion์ด๋‹ค.

\[\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}\]

z-value๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์‹คํ—˜์˜ ๊ฐ’์„ ์ž„์˜์˜๋กœ ์ •ํ•ด๋ณด๋ฉด, $n_1 = 20$, $x_1 = 18$, $n_2 = 100$, $x_2 = 84$๋ผ๊ณ  ํ•ด๋ณด์ž.

์ด๊ฒƒ์— ๋Œ€์ž…ํ•ด z-value๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด, $z = 0.686$์ด๋‹ค. ์–‘์ธก ๊ฒ€์ •์— ๋Œ€ํ•œ p-value๋ฅผ ๊ตฌํ•˜๋ฉด, $0.493$์ด๋‹ค.


์ด๋ฒˆ์—๋Š” <Homogeneity Test>๋กœ ์ ‘๊ทผํ•ด๋ณด์ž. Test Statistic์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\sum^2_{i=1} \sum^2_{j=1} \frac{(o_{ij} - e_{ij})^2}{e_{ij}}\]

๋Œ€์ž…ํ•ด์„œ $\chi^2$-value๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด, $\chi^2 = 0.471$์ด๋‹ค. ์ธํ„ฐ๋„ท์— ๋Œ์•„๋‹ค๋‹ˆ๋Š” Independent Test Calculator๋ฅผ ์“ฐ๋ฉด ๊ธˆ๋ฐฉ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค! DOF $\nu = 1$์ด๋ฏ€๋กœ p-value๋ฅผ ๊ตฌํ•˜๋ฉด, $0.493$์ด๋‹ค!

์™€์šฐ! ์ด๋ฒˆ์—๋„ ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ ๋ชจ๋‘ ๋™์ผํ•œ p-value๋ฅผ ์–ป์—ˆ๋‹ค!


๋งบ์Œ๋ง

๊ฒ€์ •(Testing)์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ ์—ฌ๊ธฐ๊นŒ์ง€๋‹ค!! ๐Ÿ‘ ์ด๊ฒƒ์œผ๋กœ โ€œํ†ต๊ณ„ํ•™(Statistics)โ€์˜ ๊ธฐ๋ณธ์ ์ธ ๋‚ด์šฉ์„ ๋ชจ๋‘ ์‚ดํŽด๋ณธ ๊ฒƒ์ด๋‹ค!! ๐Ÿ˜†

๋‹ค์Œ ํฌ์ŠคํŠธ๋ถ€ํ„ฐ <Simple Linear Regression>์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ฑ•ํ„ฐ๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์—์„œ โ€œLinear Regressionโ€์˜ ๊ณ„์ˆ˜ $\beta_i$๋“ค์„ ์–ด๋–ป๊ฒŒ ์ฐพ์„ ์ˆ˜ ์žˆ์„์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ์ฑ•ํ„ฐ๋‹ค!

๐Ÿ‘‰ Introduction to Linear Regression


References