โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

5 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

EDF; Empirical Distribution Function

For given samples $X_1, \dots, X_n$,

  • $\bar{X}$ is a โ€œlocation of the sampleโ€
  • $S^2$ is a โ€œvariability of sampleโ€

์ด๋•Œ, ์šฐ๋ฆฌ๋Š” ์œ„์™€ ๊ฐ™์ด sample points $X_1, X_2, \dots, X_n$๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์–ด๋–ค distribution function์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.


Definition. EDF; Empirical Distribution Function

Let $X_1, \dots, X_n$ be a random sample,

Letโ€™s define $\hat{F}(x)$ as

\[\hat{F}(x) := \frac{\left| \\{ i : X_i \le x\\} \right|}{n} = \frac{1}{n} \sum^n_i I(x_i \le x) = \frac{\text{# of elts $X_i$'s which is less than $x$}}{n}\]

์šฐ๋ฆฌ์˜ ์œ„์™€ ๊ฐ™์ด sample๋กœ๋ถ€ํ„ฐ ์œ ๋„ํ•œ distribution function์„ <Empirical Distribution Function>์ด๋ผ๊ณ  ํ•œ๋‹ค.


Remark.

1. $\hat{F}$ is a random variable.

2. Let $F(x) := P(X \le x)$ where $X \overset{D}{=} X_i$,

then $\hat{F}(x) \rightarrow F(x)$ as $n \rightarrow \infty$ in sense of probability.

\[\begin{aligned} \hat{F}(x) &= \frac{1}{n} \sum^n_{i=1} I(X_i \le x) \\ \end{aligned}\]

Let $Y_i = I(X_i \le x)$, then

\[\begin{aligned} \hat{F}(x) &= \frac{Y_1 + \cdots + Y_n}{n} \end{aligned}\]

By WLLN,

\[\begin{aligned} \hat{F}(x) &= \frac{Y_1 + \cdots + Y_n}{n} \\ &\overset{\text{WLLN}}{\longrightarrow} E(Y_i) = E(I(X_i \le x)) \\ &= 1 \cdot P(X_i \le x) = F(x) \end{aligned}\]

$\blacksquare$

๋”ฐ๋ผ์„œ, ์šฐ๋ฆฌ๋Š” EDF๋ฅผ ํ†ตํ•ด CDF๋ฅผ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” EDF์˜ <Quantile>์„ ์‚ดํŽด๋ด„์œผ๋กœ์จ โ€œdistribution of populationโ€์„ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค!


Quantile

Definition. Quantile; ๋ถ„์œ„์ˆ˜

The <Quantile> of the distribution function $F$ is the inverse of $F$.

A <Quantile> of a sample, $q(f)$, is a value for which a specified fraction $f$ of the data values is less than or equal to $q(f)$.

\[q(f) := \inf \left\{ x \in \mathbb{R} : F(x) \ge f \right\}\]

์ฆ‰, $q(f)$๋Š” $F(x) \ge f$๊ฐ€ ๋˜๋Š” $x$ ๊ฐ’๋“ค ์ค‘, ๊ฐ€์žฅ ์ž‘์€ ๊ฐ’์„ ๋งํ•œ๋‹ค.

<Quantile>์—๋Š” <Quertiles>, <Percentiles>, <Deciles> ๋“ฑ ์—ฌ๋Ÿฌ ๋ณ€ํ˜•๋“ค์ด ์žˆ๋‹ค. ์•„๋ž˜์˜ ํฌ์ŠคํŠธ๋ฅผ ํ†ตํ•ด ๊ทธ ๋ณ€ํ˜•๋“ค์„ ์‚ดํŽด๋ณด์ž.

๐Ÿ‘‰ โ€˜Statistics How Toโ€™์˜ ํฌ์ŠคํŠธ


If $F$ is strictly increasing, then $F(q(f)) = f$ for $f \in [0, 1]$.

Examples.

1. Let $X \sim \text{Unif}(0, 1)$

then, $g(f) = f$ for $f \in [0, 1]$.

2. Let $X \sim N(0, 1)$

(์ƒ-๋žต)


Definition. Quantile of a sample, $\hat{q}(f)$

the inverse of EDF $\hat{F}(x)$,

\[\hat{q}(f) = \inf \left\{ x : \hat{F}(x) \ge f \right\}\]

Example.

Let $\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 \}$ be sample points.

EDF $\hat{F}(x)$ is $\hat{F}(x) = \dfrac{x}{10}$. thus, $\hat{q}(0.7) = 7$


Normal Q-Q Plot

Q. What can we do about <Quantile>?

A. โ€œ๋ชจ์ง‘๋‹จ์ด ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅธ๋‹คโ€๋Š” ๊ฐ€์ •์„ ๊ฒ€ํ† ํ•˜๋Š” ๋ฐ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ!!

Definition. Normal Quantile-Quantile plot; Q-Q plot

A plot of quantile of $X$ against $q_{0, 1}(f)$ where $q_{0, 1}(f)$ is the quantile of $N(0, 1)$.

Image from Wikipedia

IF the distribution of $X$ is very close to $N(0, 1)$, then a <Normal Quantile-Quantile plot> should show a straight line.


์ง€๊ธˆ๊นŒ์ง€ โ€œํ†ต๊ณ„์  ์ถ”๋ก (Statistical Inference)โ€๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ดˆ๋ฅผ ์‚ดํŽด๋ดค๋‹ค! ๐Ÿ™Œ

๋‹ค์Œ ํฌ์ŠคํŠธ๋ถ€ํ„ฐ๋Š” โ€œํ†ต๊ณ„์  ์ถ”๋ก โ€์˜ ๋ฐฉ์‹ ์ค‘ ํ•˜๋‚˜์ธ <Estimation; ์ถ”์ •>์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. estimator์˜ <bias>์™€ <variance>์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๋ฉฐ, ์‹ ๋ขฐ ๊ตฌ๊ฐ„์„ ๊ตฌํ•˜๋Š” <Interval Estimation>์„ ์ˆ˜ํ–‰ํ•œ๋‹ค ๐Ÿ˜

๐Ÿ‘‰ Point Estimation

๐Ÿ‘‰ Interval Estimation