โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

5 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ


Definition. F-distribution

If $V_1 \sim \chi^2(n_1)$ and $V_2 \sim \chi^2(n_2)$ are independent,

then $F := \dfrac{V_1/n_1}{V_2/n_2}$ is called <Snedecorโ€™s F-distribution>1 with degrees of freedom $n_1$ and $n_2$, and denoted as

\[F \sim F(n_1, n_2)\]

ps) ์ผ๋ฐ˜์ ์œผ๋กœ, $F(n_1, n_2) \ne F(n_2, n_1)$์ด๋‹ค. F-distribution์€ non-symmetric์ด๋ผ๋Š” ๋ง.

Image from Wikipedia


Remark.

1. The order of $n_1$ and $n_2$ is very important.

In fact we have $F(n_1, n_2) \overset{D}{=} \dfrac{1}{F(n_2, n_1)}$.

2. Let $f_\alpha (n_1, n_2)$ be the number $x$ such that $\alpha = P\left(F(n_1, n_2) \ge x\right)$.

Here, we have $f_{1-\alpha}(n_1, n_2) = \dfrac{1}{f_{\alpha}(n_2, n_1)}$

Quick Proof.

\[\begin{aligned} 1 - \alpha &= P \left( F(n_1, n_2) \ge f_{1-\alpha}(n_1, n_2) \right) \\ &= P \left( \frac{1}{f_{1-\alpha}(n_1, n_2)} \ge \frac{1}{F(n_1, n_2)}\right) \\ &= P \left( \frac{1}{f_{1-\alpha}(n_1, n_2)} \ge F(n_2, n_1) \right) \\ &= 1 - P \left( F(n_2, n_1) > \frac{1}{f_{1-\alpha}(n_1, n_2)} \right) \end{aligned}\]

๋”ฐ๋ผ์„œ,

\[\begin{aligned} \alpha &= P \left( F(n_2, n_1) > \frac{1}{f_{1-\alpha}(n_1, n_2)} \right) \\ &= P \left( F(n_2, n_1) > f_{\alpha}(n_2, n_1) \right) \end{aligned}\]

๋”ฐ๋ผ์„œ,

\[f_\alpha (n_1, n_2) = \frac{1}{f_{1-\alpha}(n_2, n_1)}\]

$\blacksquare$


Theorem.

Supp. we have two independent random samples $X_1, \dots, X_{n_1}$ from $N(\mu_1, \sigma_1^2)$ and $Y_1, \dots, Y_{n_2}$ from $N(\mu_2, \sigma_2^2)$.

Let $S_1^2 = \dfrac{\sum^{n_1}_{i=1} (X_i - \bar{X})^2}{n_1 - 1}$ and \(S_2^2 = \dfrac{\sum^{n_2}_{i=1} (Y_i - \bar{Y})^2}{n_2 - 1}\).

Note that $(n_1 - 1)S_1^2/\sigma_1^2 \sim \chi^2 (n_1 - 1)$ and $(n_2 - 1)S_2^2/\sigma_2^2 \sim \chi^2 (n_2 - 1)$.

Then,

\[F := \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} \sim F(n_1 - 1, n_2 - 1)\]

Proof.

\[F := \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = \frac{\frac{(n_1 - 1) S_1^2}{\sigma_1^2} / (n_1 - 1)}{\frac{(n_2 - 1) S_2^2}{\sigma_2^2} / (n_2 - 1)}\]

์ด๋•Œ, $\dfrac{(n_1 - 1) S_1^2}{\sigma_1^2} \sim \chi^2 (n_1 - 1)$์ด๋ฏ€๋กœ <F-distribution>์˜ ์ •์˜์— ๋”ฐ๋ผ

\[F := \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = \frac{V_1 / (n_1 - 1)}{V_2 / (n_2 - 1)} \sim F(n_1 - 1, n_2 - 1)\]

Examples

$n_1 = 21$, $n_2 = 31$

Claim: $\sigma_1^2/\sigma_2^2 = 2$ but, for sample variances, $S_1^2/S_2^2 = 4 > 2$.

\[\begin{aligned} &P\left(S_1^2/S_2^2 \ge 4 \quad \text{when} \quad \sigma_1^2/\sigma_2^2 = 2\right) \\ &= P \left( \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} \ge 4 \cdot \frac{1}{2} = 2 \right) \\ &= P(F(20, 30) \ge 2) \end{aligned}\]

Here, $f_{0.05}(20, 30)=1.93$ and $f_{0.01}(20, 30) = 2.55$.

The the value of $2$ is btw $1.93$ and $2.55$.

Therefore,

\[P(F(20, 30) \ge 2) \in [0.01, 0.05]\]

์ด๊ฒƒ์˜ ์˜๋ฏธ๋Š” sample variance์˜ ๋น„์œจ์ด 4๊ฐ€ ๋˜๋Š” ํ™•๋ฅ ์€ ์ง€๊ทนํžˆ ๋‚ฎ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์ด๊ฒƒ์ด ์‹ค์ œ๋กœ ๊ด€์ธก๋˜์—ˆ์œผ๋ฏ€๋กœ, ์šฐ๋ฆฌ์˜ ๊ฐ€์ •์ธ $H_0: \sigma_1^2 / \sigma_2^2 = 2$๋ฅผ ๊ธฐ๊ฐํ•˜๊ณ , ๋‘˜์˜ population variance์˜ ๋น„์œจ์ด ๋” ์ปค์ ธ์•ผ ํ•œ๋‹ค๋Š” ๋Œ€๋ฆฝ ๊ฐ€์„ค $H_1: \sigma_1^2 / \sigma_2^2 > 2$๋ฅผ ์ฑ„ํƒํ•ด์•ผ ํ•œ๋‹ค. $\blacksquare$


์ง€๊ธˆ๊นŒ์ง€ ์šฐ๋ฆฌ๋Š” population distribution์˜ parameter์ธ โ€œํ‰๊ท โ€๊ณผ โ€œ๋ถ„์‚ฐโ€์— ๋Œ€ํ•ด ์ถ”์ •ํ–ˆ๋‹ค. ์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„œ๋Š” sample๋กœ๋ถ€ํ„ฐ ์–ป๋Š” ๋ถ„ํฌ์ธ <EDF; Empirical Distribution Function>์œผ๋กœ๋ถ€ํ„ฐ population distribution์„ ์ถ”์ •ํ•ด๋ณธ๋‹ค. ์ด ๊ณผ์ •์—์„œ ์“ฐ๋Š” ๊ฒƒ์ด ๋ฐ”๋กœ <Quantile; ๋ถ„์œ„์ˆ˜>์ด๋‹ค!

๐Ÿ‘‰ EDF and Quantile


  1. โ€œ[์„ธ๋„ค๋ฐ์ปค] F-๋ถ„ํฌโ€๋ผ๊ณ  ์ฝ๋Š” ๊ฒƒ ๊ฐ™๋‹ค.ย