ํ†ต๊ณ„ํ•™์—์„œ ์ž์œ ๋„(Degree of Freedom)๋ž€ ๋ฌด์—‡์ธ๊ฐ€? ์™œ ๋ณดํ†ต ์ž์œ ๋„๋กœ $(n-1)$ ๊ฐ’์„ ์“ฐ๋Š”๊ฐ€?

7 minute read

ํ†ต๊ณ„ํ•™์„ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ๋“ค์—ˆ๋˜ ์˜๋ฌธ๊ณผ ์ƒ๊ฐ๋“ค์„ ์—์„ธ์ด๋กœ ์ ์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค ๐Ÿ™ ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค๐ŸŽฒ

์ด๋ฒˆ ํฌ์ŠคํŠธ๋Š” ํ†ต๊ณ„ํ•™์—์„œ ๋‚˜์˜ค๋Š” โ€œ์ž์œ ๋„(Degree of Freedom)โ€์™€ โ€œ์™œ ํ†ต๊ณ„ํ•™์—์„  DOF๋ฅผ $n-1$๋กœ ์„ค์ •ํ•˜๋Š”์ง€โ€์— ๋Œ€ํ•œ ์ƒ๊ฐ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค. ๐Ÿ™Œ


ํ†ต๊ณ„ํ•™์—์„œ ์ž์œ ๋„(Degree of Freedom)๋ž€?

ํ†ต๊ณ„ํ•™์—์„œ <์ž์œ ๋„; Degree of Freedom>๋Š” ์•„๋ž˜์˜ ์˜๋ฏธ๋กœ ํ†ตํ•œ๋‹ค.

Definition. Degree of Freedom

The number of independent variates which make up the statistic.

์ฆ‰, <ํ†ต๊ณ„๋Ÿ‰(Statistic)>์„ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•œ ๋…๋ฆฝ ๋ณ€๋Ÿ‰(variate)์˜ ์ˆ˜๊ฐ€ <์ž์œ ๋„; DOF>์ธ ์…ˆ์ด๋‹ค. ๋˜๋Š” โ€œTotal number of observationsโ€๋ผ๊ณ ๋„ ํ•œ๋‹ค.

์—ฌ๊ธฐ์— ์ œ์•ฝ(constraint)์„ ํฌํ•จํ•œ ์ •ํ™•ํ•œ ์ •์˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Definition. Degree of Freedom

\[\text{DOF} = (\text{# of independent variates}) - (\text{# of constraints})\]

์ฆ‰, ์–ด๋–ค <Statistic>์˜ ์ž์œ ๋„๋Š” ๋…๋ฆฝ ๋ณ€๋Ÿ‰์˜ ์ˆ˜์—์„œ ์ œ์•ฝ์˜ ์ˆ˜๋ฅผ ๋บธ ๊ฐ’์ด๋‹ค! ๐Ÿ‘

์™œ ์ด๋Ÿฐ ์„ค๋ช…์ด ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ๋Š”์ง€ ์ข€๋” ์‚ดํŽด๋ณด์ž.

ํฌํ•ญ๊ณต๋Œ€์˜ ํ™•ํ†ต ๊ธฐ๋ง๊ณ ์‚ฌ๋Š” ๊ณผ๋ชฉ ํ‰๊ท ์ด $80$์ ์ด ๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ทœ์น™์ด ์žˆ๋‹ค.

์ด๋ฒˆ ํ•™๊ธฐ ํ™•ํ†ต์„ ๋“ฃ๋Š” ํ•™์ƒ์€ ์ด 5๋ช…์ด๋‹ค. ๋Œ€๋จธ๋ฆฌ ๊ต์ˆ˜ ๋ธ”ํ˜ผ์€ ํ•™์ƒ 4๋ช…์˜ ๊ธฐ๋ง๊ณ ์‚ฌ ์‹œํ—˜์ง€๋ฅผ ์ฑ„์ ํ–ˆ๋‹ค.

์–ด๋ผ? ๊ทธ๋Ÿฐ๋ฐ ๋’ค๋Šฆ๊ฒŒ ๊ณผ๋ชฉ ํ‰๊ท  $80$์ ์„ ๋งž์ถฐ์•ผ ํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๊ธฐ์–ต์ด ๋‚œ ๋ธ”ํ˜ผ ๊ต์ˆ˜๋Š” ๋‚จ์€ ํ•™์ƒ ํ•œ ๋ช…์˜ ์ ์ˆ˜๋ฅผ $80$์ด ๋˜๋„๋ก ๋ฐ˜.๋“œ.์‹œ ๋งž์ถฐ์•ผ ํ•œ๋‹ค!

๋ธ”ํ˜ผ ๊ต์ˆ˜๋Š” ์–ด์ฉ” ์ˆ˜ ์—†์ด ๋งˆ์ง€๋ง‰ ํ•™์ƒ์˜ ์ ์ˆ˜๋ฅผ $400$์ ์„ ์ฃผ๊ณ  ๋ง์•˜๋‹ค! 4๋ช…์ด ๋นต์ ์ด์—ˆ๋‹คโ€ฆ

์œ„์˜ ์ƒํ™ฉ์—์„  $5$๋ช…์˜ ํ•™์ƒ์˜ ์‹œํ—˜์ ์ˆ˜๋ผ๋Š” $5$๊ฐœ์˜ Variates๊ฐ€ ์žˆ์ง€๋งŒ, ๊ณผ๋ชฉ ํ‰๊ท  $80$์ ์ด๋ผ๋Š” Constraint๊ฐ€ ํ•˜๋‚˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์˜ค์ง $4$์˜ DOF๋งŒ ๊ฐ€๋Šฅํ–ˆ๋‹ค. ์ฆ‰, ์ œ์•ฝ(Constraint)์ด <Statistic>์˜ ์ž์œ ๋„๋ฅผ ๋‚ฎ์ถ”๋Š” ๊ฒƒ์ด๋‹ค!

ํ™•๋ฅ  ๋ถ„ํฌ์˜ ์ž์œ ๋„

์•ž์—์„œ ์ž์œ ๋„๋Š” ํ†ต๊ณ„๋Ÿ‰(Statistic)์— ๋Œ€ํ•ด์„œ ์ •์˜๋˜๋Š” ๊ฒƒ์ด๋ผ๊ณ  ๋งํ–ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์™œ ํ™•๋ฅ  ๋ถ„ํฌ์— ์ž์œ ๋„๋ผ๋Š” ๊ฐœ๋…์ด ์กด์žฌํ•˜๋Š” ๊ฒƒ์ผ๊นŒ? ์ด๊ฒƒ์— ๋Œ€ํ•œ ๋Œ€๋‹ต์€ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ DOF๋Š” ๋‹จ์ˆœํžˆ ํ•จ์ˆ˜ ๊ฐœํ˜•์„ ๊ฒฐ์ •ํ•˜๋Š” ์ธ์ž์— ๋ถˆ๊ณผํ•˜๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์•„๋Š” DOF๋Š” ๋ชจ๋‘ Positive Integer์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ™•๋ฅ  ๋ถ„ํฌ์˜ DOF๋Š” ์–ด๋–ค ๊ฐ’์ด๋“  ๋„ฃ์–ด๋„ ์ƒ๊ด€์—†๋‹ค! ์‹ฌ์ง€์–ด $\pi$ ๊ฐ™์€ ๊ฐ’์„ DOF๋กœ ๋„ฃ์–ด๋„ ๋œ๋‹ค! ์•„๋ฌด ์˜๋ฏธ๋„ ์—†์ง€๋งŒ


์ž์œ ๋„๋ฅผ ์ธ์ž๋กœ ๋ฐ›๋Š” ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณด์ž.

\[\chi^2(n) = \text{Gamma}\left(\frac{n}{2}, 2\right)\] \[T := \frac{Z}{\sqrt{V / n}} \quad (Z \sim N(0, 1), V \sim \chi^2(n), Z \perp V)\] \[F := \frac{V_1^2 / \sigma_1^2}{V_2^2 / \sigma_2^2} = F(n_1, n_2) \quad (V_1 \sim \chi^2(n_1), V_2 \sim \chi^2(n_2))\]

ํ†ต๊ณ„๋Ÿ‰๊ณผ ์ž์œ ๋„

์ž์œ ๋„ ๊ฐœ๋…์˜ ๋ณธ์งˆ์ธ ํ†ต๊ณ„๋Ÿ‰(Statistic)์œผ๋กœ ๋Œ์•„์˜ค์ž.

์ž์œ ๋„๋ฅผ ๊ฐœ๋…์€ ํ†ต๊ณ„๋Ÿ‰(Statistic)์—์„œ ์กด์žฌํ•˜๋Š” ๊ฐœ๋…์ด๊ณ , ํ†ต๊ณ„๋Ÿ‰์€ ํ†ต๊ณ„์  ์‹คํ—˜(Statistics Experiment)์—์„œ ๋“ฑ์žฅํ•œ๋‹ค. ํ†ต๊ณ„๋Ÿ‰์˜ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ sample variance $s^2$์ด๋‹ค.

\[s^2 = \frac{1}{n-1} \sum_i^n \left( x_i - \bar{x} \right)^2\]

์–ด๋–ค ํ†ต๊ณ„๋Ÿ‰(Statistic)๋“ค์€ ์ž์œ ๋„์˜ ๊ฐœ๋…์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์œ„์—์„œ ๋‚˜์˜จ 3๊ฐœ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ๋…€์„๋“ค: โ€œchi-square valueโ€, t-valueโ€, โ€œf-valueโ€์€ ์ž์œ ๋„๋ฅผ ๊ฐ€์ง„๋‹ค. ๊ฐ๊ฐ์€ ์ถ”์ •(Estimation)๊ณผ ๊ฒ€์ •(Test)์—์„œ ํ™œ์šฉ๋œ๋‹ค.

\[\begin{aligned} \chi^2 &:= \sum_{i=1}^k \frac{(o_i - e_i)^2}{e_i} \\ t &:= \frac{\bar{x} - \mu}{s / \sqrt{n - 1}} \\ f &:= \frac{s_1^2 / \sigma_1^2}{s_2^2 / \sigma_2^2} \end{aligned}\]


์•„๋‹ˆ ๊ทธ๋ž˜์„œ ์ž์œ ๋„(DOF)๋ž€ ๋„๋Œ€์ฒด ๋ฌด์—‡์ธ๊ฐ€? ์ด๊ฑธ ์–ด๋–ป๊ฒŒ ํ•ด์„ํ•˜๊ณ , ์–ด๋–ป๊ฒŒ ๋ฐ›์•„๋“ค์–ด์•ผ ํ•˜๋Š”๊ฐ€? ๐Ÿค”

\[s^2 = \frac{1}{n-1} \sum_i^n \left( x_i - \bar{x} \right)^2\]

Sample Variance $S^2$์—์„œ ์™œ ๋ถ„๋ชจ์— $n$ ๋Œ€์‹  $n-1$์ด ๋“ค์–ด๊ฐ€๋Š”์ง€ ๊ธฐ์–ตํ•˜๋Š”๊ฐ€? Sample Variance์— ๋Œ€ํ•ด ๋‹ค๋ค˜๋˜ ํฌ์ŠคํŠธ์—์„  $E[S^2] = \sigma^2$๊ฐ€ ๋˜๊ธฐ ์œ„ํ•ด์„œ๋ผ๊ณ  ์ˆ˜์‹์œผ๋กœ ์„ค๋ช…ํ–ˆ๋‹ค. ์ž์œ ๋„๋ฅผ ๊ฒ‰๋“ค์ธ ์ง๊ด€์ ์ธ ์„ค๋ช…์€ โ€œSample Variance์˜ ์ž์œ ๋„๊ฐ€ $n-1$์ด๊ธฐ ๋•Œ๋ฌธโ€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ†ต๊ณ„๋Ÿ‰(Statistic)์„ ์ •์˜ํ•˜๋Š” ์ด์œ ๋Š” ์—ฌ๋Ÿฌ ์ƒ˜ํ”Œ์—์„œ ์ถ”์ถœํ•œ ๊ฐ’๋“ค์„ ์ข…ํ•ฉํ•ด ๊ทธ๊ฒƒ๋“ค์„ ๋Œ€ํ‘œํ•˜๋Š” ํ•˜๋‚˜์˜ ๊ฐ’์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋‹ค. ์ด๋•Œ, ํ†ต๊ณ„๋Ÿ‰(Statistic)์— ํ•จ๊ป˜ ๋”ฐ๋ผ์˜ค๋Š” DOF๋Š” ๊ทธ ๋Œ€ํ‘œ๊ฐ’์— ์‹ค์งˆ์ ์œผ๋กœ ์–ผ๋งŒํผ์˜ ๋…๋ฆฝ์ ์ธ ์š”์†Œ๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ํ‘œํ˜„ํ•œ๋‹ค: โ€œHow many numbers in your statistic are actually independent.โ€

๋‹ค์‹œ Sample Variance $S^2$์˜ ๊ฒฝ์šฐ๋ฅผ ๋ณด์ž. $S^2$๋Š” $n$๊ฐœ Sample๋กœ๋ถ€ํ„ฐ ์œ ๋„๋˜๋Š” ๊ฐ’์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Sample Mean $\bar{X}$์˜ ๊ฐ’์ด $\bar{x}$๋กœ ์ •ํ•ด์ ธ ์žˆ๋‹ค๋ฉด, ์ด๊ฒƒ์€ ํ†ต๊ณ„๋Ÿ‰ Sample Variance๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐ์— ์ œ์•ฝ(Constraint)๊ฐ€ ๋œ๋‹ค. $n-1$ Sample์˜ ๊ฐ’์ด ์ •ํ•ด์ง„ ์ดํ›„์— ๋งˆ์ง€๋ง‰ ํ•œ Sample์˜ ๊ฐ’์ด ๊ณ ์ •๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ฒ˜์Œ์— ์‚ดํŽด๋ณธ DOF์— ๋Œ€ํ•œ ์ˆ˜์‹์— ๋”ฐ๋ผ $S^2$์˜ ์ž์œ ๋„๋Š”

\[\begin{aligned} \text{DOF} &= (\text{# of independent variates}) - (\text{# of constraints}) \\ &= n - 1 \end{aligned}\]


์ด๋ ‡๋“ฏ Sampling Statistic ์ค‘์—์„  ํ†ต๊ณ„๋Ÿ‰์„ ์œ ๋„ํ•˜๋Š”๋ฐ ์“ฐ์ธ Sample ์ˆ˜ $n$๊ณผ ํ†ต๊ณ„๋Ÿ‰์ด ๊ฐ–๋Š” ์‹ค์ œ independent variability๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค.

์ƒํ™ฉ๋ณ„๋กœ ์‚ดํŽด๋ณด๋ฉด,

  • Single Sample
    • $n$ observations & $1$ constraint: the mean โ†’ $n - 1$ variability
  • Two Samples
    • $n_1 + n_2$ oberservations & $2$ constraints: each mean โ†’ $n_1+ n_2 - 2$ variability


์—ฌ๊ธฐ์„œ ๊นœ์ง ์งˆ๋ฌธ! z-value๋Š” ์™œ ์ž์œ ๋„ ๊ฐœ๋…์ด ์—†์„๊นŒ? ๐Ÿค”

\[z := \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\]

๊ทธ ์ด์œ ๋Š” ์• ์ดˆ์— z-value๊ฐ€ ๋”ฐ๋ฅด๋Š” ๋ถ„ํฌ์ธ Normal Distribution์ด sample size $n$์— ์˜์กดํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ฐ˜๋ฉด์— z-value์—์„œ population variance $\sigma^2$๊ฐ€ sample variance $s^2$๋กœ ๋ฐ”๋€ t-value๋Š” ์ž์œ ๋„ $n-1$๋ฅผ ๊ฐ–๋Š”๋ฐ,

\[t := \frac{\bar{x} - \mu}{s / \sqrt{n - 1}}\]

์ด๊ฒƒ์€ t-value ์ž์ฒด๊ฐ€ ์ž์œ ๋„ ๊ฐœ๋…์ด ์žˆ๋Š” t-distribution์„ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์ด๊ธฐ๋„ ํ•˜๊ณ , ๋ถ„๋ชจ์— ์‚ฌ์šฉํ•œ sample variance $s^2$๊ฐ€ sample size $n$์— ์˜์กดํ•˜๋Š” ํ†ต๊ณ„๋Ÿ‰(Statistic)์ด๊ธฐ ๋•Œ๋ฌธ์ด๊ธฐ๋„ ํ•˜๋‹ค.


๋งบ์Œ๋ง

์ด ๊ธ€์„ ์ž‘์„ฑํ•˜๊ธฐ ์ „์—๋Š” ๋ฌด์ง€์„ฑ์œผ๋กœ Sample Size $n$์— $-1$ํ•œ ๊ฐ’์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋ฒˆ์— ๋‚ด์šฉ์„ ์ •๋ฆฌํ•˜๋ฉด์„œ, ์ž์œ ๋„(DOF)๊ฐ€ ๋„๋Œ€์ฒด ๋ฌด์Šจ ์˜๋ฏธ์ธ์ง€, ๊ทธ๋ฆฌ๊ณ  ์™œ $-1$์„ ๋นผ์ค„ ์ˆ˜ ๋ฐ–์— ์—†๋Š”์ง€๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๐Ÿ‘

์ž์œ ๋„(DOF) ๊ฐœ๋…์ด ์ค‘์š”ํ•œ ์˜์—ญ์€ ์ถ”์ •(Estimation)๊ณผ ๊ฒ€์ •(Test)์ด๋‹ค. Sample Statistic์˜ ์ž์œ ๋„์— ๋”ฐ๋ผ ์ถ”์ •์—์„  <significance>๊ฐ€ ๊ฒ€์ •์—์„  <p-value>๊ฐ€ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ž์œ ๋„ ๊ฐœ๋…์ด ์žˆ๋Š” ๋Œ€ํ‘œ์ ์ธ ์ถ”์ •๊ณผ ๊ฒ€์ •์˜ ์˜ˆ์‹œ๋“ค์ด๋‹ค.


Reference