EDF and Quantile
โํ๋ฅ ๊ณผ ํต๊ณ(MATH230)โ ์์ ์์ ๋ฐฐ์ด ๊ฒ๊ณผ ๊ณต๋ถํ ๊ฒ์ ์ ๋ฆฌํ ํฌ์คํธ์ ๋๋ค. ์ ์ฒด ํฌ์คํธ๋ Probability and Statistics์์ ํ์ธํ์ค ์ ์์ต๋๋ค ๐ฒ
์๋ฆฌ์ฆ: Sampling Distributions
EDF; Empirical Distribution Function
For given samples $X_1, \dots, X_n$,
- $\bar{X}$ is a โlocation of the sampleโ
- $S^2$ is a โvariability of sampleโ
์ด๋, ์ฐ๋ฆฌ๋ ์์ ๊ฐ์ด sample points $X_1, X_2, \dots, X_n$๋ฅผ ๋ฐํ์ผ๋ก ์ด๋ค distribution function์ ์๋์ ๊ฐ์ด ์ ๋ํ ์ ์๋ค.
Definition. EDF; Empirical Distribution Function
Let $X_1, \dots, X_n$ be a random sample,
Letโs define $\hat{F}(x)$ as
\[\hat{F}(x) := \frac{\left| \\{ i : X_i \le x\\} \right|}{n} = \frac{1}{n} \sum^n_i I(x_i \le x) = \frac{\text{# of elts $X_i$'s which is less than $x$}}{n}\]์ฐ๋ฆฌ์ ์์ ๊ฐ์ด sample๋ก๋ถํฐ ์ ๋ํ distribution function์ <Empirical Distribution Function>์ด๋ผ๊ณ ํ๋ค.
Remark.
1. $\hat{F}$ is a random variable.
2. Let $F(x) := P(X \le x)$ where $X \overset{D}{=} X_i$,
then $\hat{F}(x) \rightarrow F(x)$ as $n \rightarrow \infty$ in sense of probability.
Let $Y_i = I(X_i \le x)$, then
\[\begin{aligned} \hat{F}(x) &= \frac{Y_1 + \cdots + Y_n}{n} \end{aligned}\]By WLLN,
\[\begin{aligned} \hat{F}(x) &= \frac{Y_1 + \cdots + Y_n}{n} \\ &\overset{\text{WLLN}}{\longrightarrow} E(Y_i) = E(I(X_i \le x)) \\ &= 1 \cdot P(X_i \le x) = F(x) \end{aligned}\]$\blacksquare$
๋ฐ๋ผ์, ์ฐ๋ฆฌ๋ EDF๋ฅผ ํตํด CDF๋ฅผ ์ถ์ ํ ์ ์๋ค. ๋ํ, ์ฐ๋ฆฌ๋ EDF์ <Quantile>์ ์ดํด๋ด์ผ๋ก์จ โdistribution of populationโ์ ๊ฒฐ์ ํ ์ ์๋ค!
Quantile
Definition. Quantile; ๋ถ์์
The <Quantile> of the distribution function $F$ is the inverse of $F$.
A <Quantile> of a sample, $q(f)$, is a value for which a specified fraction $f$ of the data values is less than or equal to $q(f)$.
\[q(f) := \inf \left\{ x \in \mathbb{R} : F(x) \ge f \right\}\]์ฆ, $q(f)$๋ $F(x) \ge f$๊ฐ ๋๋ $x$ ๊ฐ๋ค ์ค, ๊ฐ์ฅ ์์ ๊ฐ์ ๋งํ๋ค.
<Quantile>์๋ <Quertiles>, <Percentiles>, <Deciles> ๋ฑ ์ฌ๋ฌ ๋ณํ๋ค์ด ์๋ค. ์๋์ ํฌ์คํธ๋ฅผ ํตํด ๊ทธ ๋ณํ๋ค์ ์ดํด๋ณด์.
๐ โStatistics How Toโ์ ํฌ์คํธ
If $F$ is strictly increasing, then $F(q(f)) = f$ for $f \in [0, 1]$.
Examples.
1. Let $X \sim \text{Unif}(0, 1)$
then, $g(f) = f$ for $f \in [0, 1]$.
2. Let $X \sim N(0, 1)$
(์-๋ต)
Definition. Quantile of a sample, $\hat{q}(f)$
the inverse of EDF $\hat{F}(x)$,
\[\hat{q}(f) = \inf \left\{ x : \hat{F}(x) \ge f \right\}\]Example.
Let $\{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 \}$ be sample points.
EDF $\hat{F}(x)$ is $\hat{F}(x) = \dfrac{x}{10}$. thus, $\hat{q}(0.7) = 7$
Normal Q-Q Plot
Q. What can we do about <Quantile>?
A. โ๋ชจ์ง๋จ์ด ์ ๊ท๋ถํฌ๋ฅผ ๋ฐ๋ฅธ๋คโ๋ ๊ฐ์ ์ ๊ฒํ ํ๋ ๋ฐ์ ์ฌ์ฉํ ์ ์์!!
Definition. Normal Quantile-Quantile plot; Q-Q plot
A plot of quantile of $X$ against $q_{0, 1}(f)$ where $q_{0, 1}(f)$ is the quantile of $N(0, 1)$.
Image from Wikipedia
IF the distribution of $X$ is very close to $N(0, 1)$, then a <Normal Quantile-Quantile plot> should show a straight line.
์ง๊ธ๊น์ง โํต๊ณ์ ์ถ๋ก (Statistical Inference)โ๋ฅผ ์ํํ๊ธฐ ์ํ ๊ธฐ์ด๋ฅผ ์ดํด๋ดค๋ค! ๐
๋ค์ ํฌ์คํธ๋ถํฐ๋ โํต๊ณ์ ์ถ๋ก โ์ ๋ฐฉ์ ์ค ํ๋์ธ <Estimation; ์ถ์ >์ ๋ํด ๋ค๋ฃฌ๋ค. estimator์ <bias>์ <variance>์ ๋ํด ์ดํด๋ณด๋ฉฐ, ์ ๋ขฐ ๊ตฌ๊ฐ์ ๊ตฌํ๋ <Interval Estimation>์ ์ํํ๋ค ๐
๐ Point Estimation
๐ Interval Estimation