โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

10 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

Mean


Definition.

The <expectation> or <mean> of a RV $X$ is defined as

\[\mu := E[x] := \begin{cases} \displaystyle \sum_x x f(x) && X \; \text{is a discrete with pmf} f(x) \; \\ \displaystyle \int^{\infty}_{\infty} x f(x) dx && X \; \text{is a continuous with pdf} \; f(x) \end{cases}\]

๋งŒ์•ฝ RV $X$์— ํ•จ์ˆ˜ $g(x)$๋ฅผ ์ทจํ•œ๋‹ค๋ฉด, <Expectation>์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

Theorem.

Let $X$ be a random variable with probability distribution $f(x)$. The expected value of the random variable $g(X)$ is

\[\mu_{g(X)} = E\left[g(X)\right] = \sum_x g(x) f(x) \quad \text{if } X \text{ is discrete RV}\]

and

\[\mu_{g(X)} = E\left[g(X)\right] = \int^{\infty}_{\infty} g(x) f(x) \quad \text{if } X \text{ is continuous RV}\]

($g(x)$๋ฅผ ์ทจํ•˜๋„ ์—ฌ์ „ํžˆ $x$์˜ ์ •์˜์—ญ์€ ์œ ์ง€๋˜๋ฏ€๋กœ, ์œ„์™€ ๊ฐ™์ด $g(x) f(x)$๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํƒ€๋‹นํ•˜๋‹ค.)

ps) ์ˆ˜์—… ์‹œ๊ฐ„์— ๊ต์ˆ˜๋‹˜๊ป˜์„œ ์ด์‚ฐ RV์— ๋Œ€ํ•œ ์ฆ๋ช…์€ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์—ฐ์† RV์— ๋Œ€ํ•œ ์ฆ๋ช…์€ ์ข€ ๊นŒ๋‹ค๋กญ๋‹ค๊ณ  ํ•˜์…จ๋‹ค.


์ด๋ฒˆ์—๋Š” joint distributions์— ๋Œ€ํ•œ <Expectation>์„ ์‚ดํŽด๋ณด์ž.

Definition.

Let $X$ and $Y$ be RVs with joint probability distribution $f(x, y)$. The expected value of the RV $g(X, Y)$ is

\[\mu_{g(X, Y)} = E\left[g(X, Y)\right] = \sum_x \sum_y g(x, y) f(x, y) \quad \text{if } X \text{ and } Y \text{ is discrete RV}\] \[\mu_{g(X, Y)} = E\left[g(X, Y)\right] = \int^{\infty}_{-\infty} \int^{\infty}_{-\infty} g(x, y) f(x, y) \; dx dy \quad \text{if } X \text{ and } Y \text{ is continuous RV}\]


Conditional Distribution์— ๋Œ€ํ•ด์„œ๋„ <Expectation>์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

Definition.

\[E\left[ X \mid Y = y \right] = \begin{cases} \displaystyle \sum_x x f(x \mid y) && X \; \text{is a discrete with joint pmf} f(x, y) \; \\ \displaystyle \int^{\infty}_{\infty} x f(x \mid y) \; dx && X \; \text{is a continuous with joint pdf} \; f(x, y) \end{cases}\]

Linearity of Expectation

<Expectation>์€ <Linearity>๋ผ๋Š” ์•„์ฃผ ์ข‹์€ ์„ฑ์งˆ์„ ๊ฐ€์ง„๋‹ค.

Theorem.

Let $a, b \in \mathbb{R}$, then $E\left[aX + b\right] = aE[X] + b$.

์œ„์˜ ์ •๋ฆฌ๊ฐ€ ๋งํ•ด์ฃผ๋Š” ๊ฒƒ์€ <Expectation>์ด Linear Operator์ž„์„ ๋งํ•ด์ค€๋‹ค!! ๐Ÿคฉ

์ข€๋” ํ™•์žฅํ•ด์„œ ๊ธฐ์ˆ ํ•ด๋ณด๋ฉด,

Theorem.

\[E\left[g(X) + h(X)\right] = E\left[g(X)\right] + E\left[h(X)\right]\]


Theorem.

\[E\left[g(X, Y) + h(X, Y)\right] = E\left[g(X, Y)\right] + E\left[h(X, Y)\right]\]

Expectation with Independence

๋งŒ์•ฝ ๋‘ RV $X$, $Y$๊ฐ€ ์„œ๋กœ <๋…๋ฆฝ>์ด๋ผ๋ฉด, ๋‘ RV์˜ ๊ณฑ์— ๋Œ€ํ•œ <Expectation>์„ ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

Theorem.

If $X$ and $Y$ are independent, then

\[E[XY] = E[X]E[Y]\]

Variance and Covariance

๋‘ RV $X$, $Y$๊ฐ€ ๋™์ผํ•œ ํ‰๊ท ์„ ๊ฐ€์ง€๋”๋ผ๋„; $E[X] = \mu = E[Y]$ RV์˜ ๊ฐœ๋ณ„ ๊ฐ’๋“ค์ด ํ‰๊ท  $\mu$๋กœ๋ถ€ํ„ฐ ๋–จ์–ด์ ธ ์žˆ๋Š” ์ •๋„๋Š” ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค. <๋ถ„์‚ฐ Variance>๋Š” ์ด๋Ÿฐ ํ‰๊ท ์œผ๋กœ๋ถ€ํ„ฐ์˜ ํผ์ง„ ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ์ง€ํ‘œ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.


Definition.

The <variance> of a RV $X$ is defined as

\[\text{Var}(X) = E[(X-\mu)^2]\]

and $\sigma = \sqrt{\text{Var}(X)}$ is called the <standard deviation> of $X$.

์•„๋ž˜์˜ ๊ณต์‹์„ ์‚ฌ์šฉํ•˜๋ฉด, $\text{Var}(X)$๋ฅผ ์ข€๋” ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.


Theorem.

\[\begin{aligned} \text{Var}(X) &= E[(X-\mu)^2] = E\left[ X^2 - 2 \mu X + \mu^2 \right] \\ &= E[X^2] - 2 \mu E[X] + \mu^2 \\ &= E[X^2] - 2 \mu \cdot \mu + \mu^2 \\ &= E[X^2] - \mu^2 = E[X^2] - \left(E[X]\right)^2 \end{aligned}\]

โ€œ๋ถ„์‚ฐ = ์ œํ‰ - ํ‰์ œโ€, ๊ณ ๋“ฑํ•™๊ต ๋•Œ ๋ฐฐ์šด ๊ณต์‹์ด๋‹ค!


<Expectation>์€ Linearity๋ผ๋Š” ์ข‹์€ ์„ฑ์งˆ์„ ๊ฐ€์ง€๊ณ  ์žˆ์—ˆ๋‹ค. <๋ถ„์‚ฐ Variance>์—์„œ๋Š” ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ์‚ดํŽด๋ณด์ž.

Theorem.

For any $a, b \in \mathbb{R}$,

\[\text{Var}(aX + b) = a^2 \text{Var}(X)\]

Covariance

<๊ณต๋ถ„์‚ฐ Covariance>๋Š” ๋‘ RV ์‚ฌ์ด์— ์–ด๋–ค <๊ด€๊ณ„ relation>์ด ์žˆ๋Š”์ง€๋ฅผ ์กฐ์‚ฌํ•˜๋Š” ์ง€ํ‘œ๋‹ค. <๊ณต๋ถ„์‚ฐ>์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

Definition.

The <covariane> of $X$ and $Y$ is defined as

\[\begin{aligned} \sigma_{XY} := \text{Cov}(X, Y) &= E \left[ (X - \mu_X) (Y - \mu_Y) \right] \\ &= E(XY) - E(X)E(Y) \end{aligned}\]
  • $\text{Cov}(X, X) = \text{Var}(X)$
  • $\text{Cov}(aX + b, Y) = a \cdot \text{Cov}(X, Y)$
  • $\text{Cov}(X, c) = 0$

์•ž์—์„œ ์‚ดํŽด๋ดค์„ ๋•Œ, ๋‘ RV $X$, $Y$๊ฐ€ ๋…๋ฆฝ์ด๋ผ๋ฉด, $E(XY) = E(X)E(Y)$๊ฐ€ ๋˜์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋‘ RV๊ฐ€ ๋…๋ฆฝ์ผ ๋•Œ๋Š” $\text{Cov}(X, Y) = 0$์ด ๋œ๋‹ค! ๊ทธ๋Ÿฌ๋‚˜ ์ฃผ์˜ํ•  ์ ์€ ๋ช…์ œ์˜ ์—ญ(ๆ˜“)์ธ $\text{Cov}(X, Y) = 0$์ผ ๋•Œ, ๋‘ RV๊ฐ€ ํ•ญ์ƒ ๋…๋ฆฝ์ž„์„ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค!

<Covariance>์€ ๋‘ RV์˜ Linear Combination์— ๋Œ€ํ•œ ๋ถ„์‚ฐ์„ ๊ตฌํ•  ๋•Œ๋„ ์‚ฌ์šฉํ•œ๋‹ค.

Let $a, b, c \in \mathbb{R}$, then

\[\text{Var}(aX + bY + c) = a^2 \text{Var}(X) + b^2 \text{Var}(Y) + 2 \text{Cov}(X, Y)\]

์ฆ๋ช…์€ $\text{Var}(aX + bY + c)$์˜ ์˜๋ฏธ๋ฅผ ๊ทธ๋Œ€๋กœ ์ „๊ฐœํ•˜๋ฉด ์‰ฝ๊ฒŒ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค.

\[\text{Var}(aX + bY + c) = E\left[ \left( (X+Y) - (\mu_X + \mu_Y) \right)^2 \right]\]

Correlation

<๊ณต๋ถ„์‚ฐ>์„ ์ข€๋” ๋ณด๊ธฐ ์‰ฝ๊ฒŒ Normalize ํ•œ ๊ฒƒ์ด <Correlation>์ด๋‹ค.

Definition.

The <correlation> of $X$ and $Y$ is defined as

\[\rho_{XY} := \text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)} \sqrt{\text{Var}(Y)}}\]
  • if $\rho_{XY} > 0$, $X$ and $Y$ are positively correlated.
  • if $\rho_{XY} < 0$, $X$ and $Y$ are negatively correlated.
  • if $\rho_{XY} = 0$, $X$ and $Y$ are uncorrelated.

๋งŒ์•ฝ ๋‘ RV๊ฐ€ ์™„๋ฒฝํ•œ ์„ ํ˜•์„ฑ์„ ๋ณด์ธ๋‹ค๋ฉด, $\rho_{XY}$๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • if $Y = aX + b$ for $a > 0$, then $\text{Corr}(X, Y) = 1$
  • if $Y = aX + b$ for $a < 0$, then $\text{Corr}(X, Y) = -1$

์œ„์˜ ๋ช…์ œ๋Š” ๊ทธ ์—ญ๋„ ์„ฑ๋ฆฝํ•œ๋‹ค. ์ฆ๋ช…์€ ์•„๋ž˜์˜ Exercise์—์„œ ์ง„ํ–‰ํ•˜๊ฒ ๋‹ค.

<Correlation>์€ $[-1, 1]$์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค. ์ด๋Š” <์ฝ”์‹œ-์Šˆ๋ฐ”๋ฅดํŠธ ๋ถ€๋“ฑ์‹>์„ ํ†ตํ•ด ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋‹ค!

Cauchy-Schwarrtz inequality :

\[\left( \sum a_i b_i \right)^2 \le \sum a_i^2 \sum b_i^2\]

Correlation ์‹์„ ์˜๋ฏธ์— ๋งก๊ฒŒ ํ’€์–ด์“ฐ๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} \text{Corr}(X, Y) &= \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X)} \sqrt{\text{Var}(Y)}} = \frac{E[(X-\mu_X)(Y - \mu_Y)]}{\sqrt{E[(X-\mu_X)^2]} \sqrt{E[(Y-\mu_Y)^2]}} \\ &= \frac{\sum (X-\mu_X)(Y - \mu_Y)}{\sqrt{\sum (X-\mu_X)^2} \sqrt{\sum (Y-\mu_Y)^2}} \end{aligned}\]

์ด์ œ ์œ„์˜ ์‹์„ ์ œ๊ณฑํ•ด์„œ ์‚ดํŽด๋ณด๋ฉด

\[(\rho_{XY})^2 = \left( \frac{\sum (X-\mu_X)(Y - \mu_Y)}{\sqrt{\sum (X-\mu_X)^2} \sqrt{\sum (Y-\mu_Y)^2}} \right)^2 = \frac{\left( \sum (X-\mu_X)(Y - \mu_Y) \right)^2 }{\sum (X-\mu_X)^2 \sum (Y-\mu_Y)^2}\]

<์ฝ”์‹œ-์Šˆ๋ฐ”๋ฅด์ธ  ๋ถ€๋“ฑ์‹>์—์„œ ์šฐ๋ณ€์„ ์ขŒ๋ณ€์œผ๋กœ ์ด๋™ํ•˜๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์€ ๋ถ€๋“ฑ์‹์ด ์„ฑ๋ฆฝํ•œ๋‹ค.

\[\frac{\left( \sum a_i b_i \right)^2}{\sum a_i^2 \sum b_i^2} \le 1\]

์ด๋ฅผ <Correlation>์˜ ์ œ๊ณฑ์‹์— ์ ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[(\rho_{XY})^2 = \frac{\left( \sum (X-\mu_X)(Y - \mu_Y) \right)^2 }{\sum (X-\mu_X)^2 \sum (Y-\mu_Y)^2} \le 1\]

๋”ฐ๋ผ์„œ $(\rho_{XY})^2 \le 1$์ด๋ฏ€๋กœ

\[-1 \le \rho_{XY} \le 1\]

$\blacksquare$

์ถ”๊ฐ€๋กœ <Correlation>์€ โ€œํ‘œ์ค€ํ™”โ€ํ•œ RV์˜ ๊ณต๋ถ„์‚ฐ์œผ๋กœ๋„ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค.

$Z = \dfrac{X-\mu_X}{\sigma_X}$, $W = \dfrac{Y-\mu_Y}{\sigma_Y}$๋ผ๊ณ  ํ‘œ์ค€ํ™”ํ•œ๋‹ค๋ฉด, ์ด ๋‘˜์˜ ๊ณต๋ถ„์‚ฐ์€ $X$, $Y$์— ๋Œ€ํ•œ Correlation๊ณผ ๊ฐ™๋‹ค.

\[\text{Var}(Z, W) = \text{Corr}(X, Y)\]

๋”ฑ ๋ณด๋ฉด ์ฆ๋ช… ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์•„์„œ ๋”ฐ๋กœ ์œ ๋„๋Š” ํ•˜์ง€ ์•Š๊ฒ ๋‹ค.


Q1. $\text{Var}(X) = 0$๋Š” ๋ฌด์—‡์„ ์˜๋ฏธํ•˜๋Š”๊ฐ€?

A1.


Q2. $\text{Cov}(X, Y) = 0$์ด์ง€๋งŒ, ๋‘ RV๊ฐ€ ๋…๋ฆฝ์ด ์•„๋‹Œ ์˜ˆ๋ฅผ ์ œ์‹œํ•˜๋ผ.


Q3. Prove that $-1 \le \text{Corr}(X, Y) \le 1$.


Q4. Prove that if $\text{Corr}(X, Y) = 1$, then there exist $a>0$ and $b\in\mathbb{R}$ s.t. $Y = aX + b$.

ํŽผ์ณ๋ณด๊ธฐ

A1. $p(x)$๊ฐ€ delta-function์ž„์„ ์˜๋ฏธํ•œ๋‹ค.


A2. $Y=X^2$์œผ๋กœ ์„ค์ •ํ•˜๋ฉด ์‰ฝ๊ฒŒ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค. ๋…๋ฆฝ์ž„์„ ๋ณด์ด๊ธฐ ์œ„ํ•ด $p(x, y)$๋ฅผ ๊ตฌํ•ด์•ผ ํ•  ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, ์ด๊ฒƒ ์—ญ์‹œ ์ ์ ˆํžˆ ์ž˜ ์„ค์ •ํ•ด์ฃผ๋ฉด ์‰ฝ๊ฒŒ reasonableํ•˜๊ฒŒ ๋””์ž์ธ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.


A3. & A4. Q3๋Š” ์ด๋ฏธ ์œ„์—์„œ ์ฆ๋ช…์„ ํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ๋„ ์ฆ๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค! ๐Ÿ‘‰ ์ด๊ณณ์˜ [2, 3]p๋ฅผ ์ฐธ๊ณ ํ•˜๋ผ.


์ด์–ด์ง€๋Š” ๋‚ด์šฉ์—์„œ๋Š” <ํ‰๊ท >๊ณผ <๋ถ„์‚ฐ>์— ๋Œ€ํ•œ ์•ฝ๊ฐ„์˜ ์ถ”๊ฐ€์ ์ธ ๋‚ด์šฉ์„ ์‚ดํŽด๋ณธ๋‹ค.

๐Ÿ‘‰ Chebyshevโ€™s Inequality

๊ทธ๋ฆฌ๊ณ  Discrete RV์—์„œ์˜ ๊ธฐ๋ณธ์ ์ธ Probability Distribution์„ ์‚ดํŽด๋ณธ๋‹ค.