โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

6 minute read

โ€œํ™•๋ฅ ๊ณผ ํ†ต๊ณ„(MATH230)โ€ ์ˆ˜์—…์—์„œ ๋ฐฐ์šด ๊ฒƒ๊ณผ ๊ณต๋ถ€ํ•œ ๊ฒƒ์„ ์ •๋ฆฌํ•œ ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค. ์ „์ฒด ํฌ์ŠคํŠธ๋Š” Probability and Statistics์—์„œ ํ™•์ธํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค ๐ŸŽฒ

๋ช‡๋ช‡ Distribution์˜ ๊ฒฝ์šฐ ํ˜„์‹ค์„ ๋ชจ์‚ฌํ•˜๊ณ  ์ž˜ ์„ค๋ช…ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŠธ์—์„  Discrete RV์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์œ ๋ช…ํ•œ Distributions์„ ์‚ดํŽด๋ณธ๋‹ค. ๊ฐ Distribution์ด ๋‹ค๋ฅธ ๋ถ„ํฌ์— ๋Œ€ํ•œ Motivation์ด ๋˜๊ณ , ๊ฐ๊ฐ์ด ๋ชจ๋‘ ์ค‘์š”์„ฑ์„ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์˜๋ฏธ๋ฅผ ๊ณฑ์”น๊ณ , ์ถฉ๋ถ„ํžˆ ์—ฐ์Šตํ•ด์•ผ ํ•œ๋‹ค.


Discrete Uniform Distributions

Discrete RV $X$์— ๋Œ€ํ•ด ๊ฐ sample point $x$์˜ pmf $f(x)$์˜ ๊ฐ’์ด ๋ชจ๋‘ ๋™์ผํ•œ ๊ฒฝ์šฐ๋ฅผ ๋Œ€ํ‘œํ•œ๋‹ค.

Definition.

Let $X$ takes values $x_1, \dots, x_N$. We say that $X$ has a <discrete uniform disctribution> if

\[f(x) = P(X=x_i) = \frac{1}{N}\]

<uniform distribution>์˜ ๊ฒฝ์šฐ, ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์€ ์•„๋ž˜์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.

  • $E[X]= \dfrac{\sum x_i}{N}$
  • $\text{Var}(X) = \dfrac{\sum x_i^2}{N} - \dfrac{(\sum x_i)^2}{N^2}$ // ๊ทธ๋ƒฅ (์ œํ‰-ํ‰์ œ) ๊ณต์‹์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

Bernoulli Distribution

<Bernolli Distribution>์€ ๋™์ „ ๋˜์ง€๊ธฐ์— ๋Œ€ํ•œ Distribution์ด๋‹ค. ์ข€๋” ์ผ๋ฐ˜ํ™”ํ•ด์„œ ๋งํ•˜๋ฉด, Sample space์—์„œ ๋‹จ ๋‘๊ฐœ์˜ sample point๋ฅผ ๊ฐ€์งˆ ๋•Œ, Bernoulli Distribution์ด๋ผ๊ณ  ํ•œ๋‹ค.

Definition.

(1) A <Bernoulli trial> is an experiment whose outcomes are only success or failure.

(2) A RV $X$ is said to have <Bernoulli Distributions> if its pmf is given by

\[f(x) = p^x \cdot (1-p)^{1-x}\]

We denote it as

\[X \sim \text{Bernoulli}(p)\]

์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€ <Bernoulli Trial>์€ ๋”ฑ ํ•œ๋ฒˆ๋งŒ ์‹œํ–‰ํ•˜๋Š” ๊ฒƒ์ด๋‹ค! Trial์„ ์—ฌ๋Ÿฌ๋ฒˆ ํ•œ๋‹ค๋ฉด, ๋’ค์— ๋‚˜์˜ฌ <Binomial Distribution>์ด ๋œ๋‹ค.


Theorem.

If $X$ is a Bernoulli RV, then

  • $\displaystyle E[X] = \sum x f(x) = 1 f(1) = p$
  • $\displaystyle \text{Var}(X) = E[X^2] - (E[X])^2 = p - p^2 = p (1-p) = pq$

Binomial Distribution

<Bernoulli Trial>์€ ๋™์ „์„ ๋”ฑ ํ•œ๋ฒˆ ๋˜์ง€๋Š” ์‹œํ–‰์ด์—ˆ๋‹ค. ๋งŒ์•ฝ ๋™์ „์„ $n$๋ฒˆ ๋งŒํผ ์—ฌ๋Ÿฌ๋ฒˆ ๋˜์ง„๋‹ค๋ฉด, ๋ช‡๋ฒˆ ์„ฑ๊ณต(success) ํ–ˆ๋Š”์ง€ ์„ธ์–ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋งŒ์•ฝ ์„ฑ๊ณต์˜ ํšŸ์ˆ˜๋ฅผ RV $X$๋กœ ๋‘”๋‹ค๋ฉด, ์šฐ๋ฆฌ๋Š” <Binomial Distribution>๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ถ„ํฌ๋ฅผ ์–ป๊ฒŒ ๋œ๋‹ค.

Definition.

When a RV $X$ has a pmf

\[f(x) = b(x;n, p) = \binom{n}{x} p^x q^{n-x}\]

We call $X$ as a <binomial random variable> and denote it as

\[X \sim \text{Binomial}(n, p) \quad \text{or} \quad X \sim \text{BIN}(n, p)\]

ํ™•์ธํ•  ์ ์€ <Binomial Distribution>์˜ pmf $f(x)$๊ฐ€ ์ •๋ง๋กœ pmf์ธ์ง€์ด๋‹ค. ์ด๊ฒƒ์„ ํ™•์ธํ•˜๋ ค๋ฉด pmf $f(x)$์˜ ํ•ฉ์ด 1์ด ๋จ์„ ๋ณด์ด๋ฉด ๋œ๋‹ค. ์ด๊ฒƒ์€ <์ดํ•ญ ์ •๋ฆฌ Binomial Theorem>์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ๋ณด์ผ ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ถ„ํฌ๊ฐ€ <Binomial>๋ผ๋Š” ์ด๋ฆ„์ธ ์ด์œ ๊ฐ€ ์ด๊ฒƒ ๋•Œ๋ฌธ์ด๋‹ค.

\[\sum_x f(x) = \sum^n_{k=0} \binom{n}{k} p^k (1-p)^{n-k} = \left(p + (1-p)\right)^n\]

์ด๋ฒˆ์—๋Š” <Binomial Distribution>์—์„œ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์‚ดํŽด๋ณด์ž.

  • $\displaystyle E[X] = np$
  • $\displaystyle \text{Var}(X) = npq$

๋จผ์ € ํ‰๊ท  $E[x]$๊ฐ€ $np$๊ฐ€ ๋˜๋Š” ์ด์œ ๋ฅผ ์ˆ˜ํ•™์  ์ฆ๋ช… ์—†์ด ์„ค๋ช…ํ•ด๋ณด์ž. RV $X$๋Š” ์ „์ฒด ์„ฑ๊ณต์˜ ํšŸ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๊ฒƒ์€ ๊ณง ๊ฐœ๋ณ„ ์‹œํ–‰ $X_i$์— ๋Œ€ํ•ด ์•„๋ž˜๊ฐ€ ์„ฑ๋ฆฝํ•จ์„ ๋งํ•œ๋‹ค.

\[X = X_1 + X_2 + \cdots + X_n\]

์ด๋•Œ, ๊ฐœ๋ณ„ ์‹œํ–‰ $X_i$๊ฐ€ Bernoulli Distribution์„ ๋”ฐ๋ฅด๊ณ , ์„œ๋กœ๊ฐ€ ๋…๋ฆฝ์œผ๋ฏ€๋กœ <expectation>์˜ Linearity์— ์˜ํ•ด

\[\begin{aligned} E[X] &= E[X_1 + \cdots + X_n] \\ &= E[X_1] + \cdots + E[X_n] \\ &= p + \cdots + p \\ &= n \cdot p \end{aligned}\]

์ข€๋” ์—„๋ฐ€ํ•˜๊ฒŒ ์ฆ๋ช…ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

\[\begin{aligned} E[X] &= \sum k f(k) = \sum k \binom{n}{k} p^k q^{n-k} \\ &= \sum^{n}_{k=0} k \frac{n!}{k! (n-k)!} p^k q^{n-k} \\ &= \sum^{n}_{k=1} k \frac{n!}{k! (n-k)!} p^k q^{n-k} \\ &= \sum^{n}_{k=1} \frac{n!}{(k-1)! (n-k)!} p^k q^{n-k} \\ &= n \cdot \sum^{n}_{k=1} \frac{(n-1)!}{(k-1)! (n-k)!} p^k q^{n-k} \\ &= np \cdot \sum^{n}_{k=1} \frac{(n-1)!}{(k-1)! (n-k)!} p^{k-1} q^{n-k} \\ &= np \cdot \sum^{n-1}_{k=0} \frac{(n-1)!}{k! ((n-1)-k)!} p^{k} q^{(n-1)-k} \\ &= np \cdot (p + (1-p))^{n-1} = np \end{aligned}\]

$\blacksquare$

๋ถ„์‚ฐ $\text{Var}(X)$์„ ์ฆ๋ช…ํ•˜๋Š” ๊ฑด ์กฐ๊ธˆ ์‰ฝ์ง€ ์•Š๋‹ค. ์ฆ๋ช…์€ Exercise๋กœ ๋‚จ๊ธฐ์ง€๋งŒ, ๋ฐ˜๋“œ์‹œ ์ง์ ‘ ์ฆ๋ช…ํ•ด๋ด์•ผ ํ•˜๋Š” ๋ช…์ œ๋‹ค ๐ŸŽˆ


Multinomial Distribution

์ง€๊ธˆ๊นŒ์ง€ ๋ชจ๋‘ ๋™์ „ ๋˜์ง€๊ธฐ์—์„œ ๋ณ€์ฃผ๋œ Distribution๋“ค์„ ์‚ดํŽด๋ดค๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์‹ค์—์„  ์•ž/๋’ค ๋‘ ๊ฒฐ๊ณผ๋งŒ ์žˆ์ง€ ์•Š๋“ฏ์ด <Outcome>์ด ์—ฌ๋Ÿฌ ๊ฐœ์ธ ๊ฒฝ์šฐ์˜ ๋ถ„ํฌ๋„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค! 6๋ฉด์˜ ์ฃผ์‚ฌ์œ„ ๋˜์ง€๊ธฐ๊ฐ€ ๊ทธ๋Ÿฐ ๊ฒฝ์šฐ๋‹ค! ์šฐ๋ฆฌ๋Š” ์ด๊ฒƒ์„ <Multinomial Distribution>๋ผ๊ณ  ํ•œ๋‹ค.

Definition.

The <multinomial experiment> consists of independent repeated $n$ trials and each trial results in $k$ possible outcomes $E_1, \dots, E_k$.

  • $P(E_i) = p_i$ and $\displaystyle \sum^k_{i=1} p_i = 1$

Let $X_i$ be the number of $E_i$โ€™s in $n$ trials, then

\[P(X_1=x_1, \cdots, X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} \cdot p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k} \quad \text{where} \quad x_1 + \cdots + x_k = n\]

<Multinomail distribution>์˜ pmf $f(x_1, \dots, x_k)$๋Š” ์ผ์ข…์˜ joint pmf๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ <Multinomail distribution>์— ๋Œ€ํ•ด ์•„๋ž˜์˜ margnial distribution๋“ค์„ ์ƒ๊ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  • $X_k \sim \text{BIN}(n, p_k)$
  • $X_i + X_j \sim \text{BIN}(n, p_i + p_j)$

์ด์–ด์ง€๋Š” ํฌ์ŠคํŠธ์—์„  ์ข€๋” ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ ์ดํ•ญ ๋ถ„ํฌ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ๐Ÿคฉ

  • Hypergeometric Distribution
  • Geometric Distribution
  • Negative Binomial Distribution
  • Poisson Random Variable

๐Ÿ‘‰ Discrete Probability Distribution - 2