β€œν™•λ₯ κ³Ό 톡계(MATH230)” μˆ˜μ—…μ—μ„œ 배운 것과 κ³΅λΆ€ν•œ 것을 μ •λ¦¬ν•œ ν¬μŠ€νŠΈμž…λ‹ˆλ‹€. 전체 ν¬μŠ€νŠΈλŠ” Probability and Statisticsμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€ 🎲

5 minute read

β€œν™•λ₯ κ³Ό 톡계(MATH230)” μˆ˜μ—…μ—μ„œ 배운 것과 κ³΅λΆ€ν•œ 것을 μ •λ¦¬ν•œ ν¬μŠ€νŠΈμž…λ‹ˆλ‹€. 전체 ν¬μŠ€νŠΈλŠ” Probability and Statisticsμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€ 🎲

이전 ν¬μŠ€νŠΈμ—μ„œ 이산 λΆ„ν¬μ˜ 기본이 λ˜λŠ” <Bernoulli Distribution>, <Binomial Distribution> 등등을 μ‚΄νŽ΄λ΄€λ‹€. 이번 ν¬μŠ€νŠΈμ—μ„œλŠ” 쒀더 μž¬λ―ΈμžˆλŠ” 뢄포듀이 λ“±μž₯ν•œλ‹€!

HyperGeometric Distribution

<HyperGeometric Distribution>은 μ•žμ—μ„œ μ‚΄νŽ΄λ³Έ <Binomial Distribution>κ³Ό 상황이 정말 λΉ„μŠ·ν•˜λ‹€. ν•˜μ§€λ§Œ, Sampling λ°©μ‹μ—μ„œ 차이가 μžˆλ‹€.

  • <Binomial Distribution>은 각 trial이 독립적이고, with replacement μ˜€λ‹€.
  • λ°˜λ©΄μ— <HyperGeometric Distribution>은 각 trial이 dependentν•˜κ³  w/o replacement둜 μ§„ν–‰λœλ‹€!

w/o replacement λ°©μ‹μœΌλ‘œ μƒ˜ν”Œλ§ν•˜λŠ” κ²ƒμ˜ μ˜ˆμ—λŠ” <Acceptance Sampling>이 μžˆλ‹€. λ¬Όν’ˆμ„ ν’ˆμ§ˆμ„ κ²€μˆ˜ν•˜λŠ” 이 μž‘μ—…μ„  ν…ŒμŠ€νŒ… 후에 λ¬Όν’ˆμ΄ νŒŒκ΄΄λ˜κ±°λ‚˜ 더이상 쓰지 λͺ»ν•˜κ²Œ 되기 λ•Œλ¬Έμ— ꡐ체λ₯Ό ν•  μˆ˜κ°€ μ—†λ‹€. κ·Έλ ‡κΈ° λ•Œλ¬Έμ— w/o replacementλ₯Ό λ°”νƒ•μœΌλ‘œ ν•˜λŠ” μƒ˜ν”Œλ§μ— λŒ€ν•œ λ…Όμ˜λŠ” κΌ­ ν•„μš”ν•˜λ‹€.

Definition. HyperGeometric Distribution

μ„±κ³΅μœΌλ‘œ ν‘œμ‹œλœ $K$개의 μƒ˜ν”Œκ³Ό μ‹€νŒ¨λ‘œ ν‘œμ‹œλœ $N-K$개의 μƒ˜ν”Œμ΄ μžˆλŠ” $N$개의 μƒ˜ν”Œμ—μ„œ, λ¬΄μž‘μœ„λ‘œ $n$개의 μƒ˜ν”Œμ„ w/o replacement둜 λ½‘λŠ”λ‹€κ³  ν•˜μž. 이것을 <HyperGeometric Experiment>라고 ν•œλ‹€. μ΄λ•Œ, RV $X$λŠ” <HyperGeometric Experiment>μ—μ„œ 성곡을 뽑은 νšŸμˆ˜μ΄λ‹€. 이 RV $X$λ₯Ό <HyperGeometric RV>라고 ν•œλ‹€.

<HyperGeometric RV> $X$의 pmfλŠ” μ•„λž˜μ™€ 같이 μ •μ˜λœλ‹€.

\[h(x; N, K, n) = \frac{\binom{K}{x} \binom{N-K}{n-x}}{\binom{N}{n}} \quad \text{where} \quad 0 \le x \le K \quad \text{and} \quad 0 \le n-x \le N-K\]

μœ„μ™€ 같은 pmfλ₯Ό <HyperGeometric Distribution>라고 ν•˜λ©°, $X \sim \text{HyperGeo}(N, K, n)$둜 ν‘œκΈ°ν•œλ‹€.

μ΄λ•Œ, <HyperGeometric Distribution>에 λŒ€ν•œ 쑰건식을 λ‹€λ“¬μœΌλ©΄ μ•„λž˜μ™€ κ°™λ‹€.

\[\begin{aligned} \quad 0 \le x \le K \quad &\text{and} \quad 0 \le n-x \le N-K \\ \quad 0 \le x \le K \quad &\text{and} \quad -n \le -x \le N-K-n \\ \quad 0 \le x \le K \quad &\text{and} \quad K+n - N \le x \le n \\ \end{aligned}\] \[\therefore \max \{ 0, n-(N-K) \} \le x \le \min \{ K, n \}\]

Theorem.

Let $X \sim \text{HyperGeo}(N, K, n)$, then

  • $\displaystyle E[X] = n \frac{K}{N}$
  • $\displaystyle \text{Var}(X) = n \frac{K}{N}\left( 1 - \frac{K}{N} \right) \cdot \frac{N-n}{N-1}$

μ§€κΈˆ λ‹Ήμž₯ <HyperGeometric Distribution>에 λŒ€ν•œ 평균과 뢄산에 λŒ€ν•œ 정리λ₯Ό 증λͺ…ν•˜μ§€λŠ” μ•Šμ„ 것이닀. κ·ΈλŸ¬λ‚˜ μœ„μ˜ 식을 쒀더 μ§κ΄€μ μœΌλ‘œ 이해해보면, <Binomial Distribution>의 κ²½μš°μ™€ 정말 μœ μ‚¬ν•¨μ„ λ°œκ²¬ν•  수 μžˆλ‹€.

HyperGeo의 $\dfrac{K}{N}$λ₯Ό Binomial의 $p$둜 ν•΄μ„ν•œλ‹€λ©΄, Binomial의 평균인 $np$와 HpyerGeom의 $n\dfrac{K}{N}$λŠ” κ·Έ ν˜•νƒœκ°€ κ½€ λΉ„μŠ·ν•˜λ‹€. λΆ„μ‚°μ˜ κ²½μš°μ—λ„ HyperGeo의 경우 $n \dfrac{K}{N}\left( 1 - \dfrac{K}{N} \right) \cdot \dfrac{N-n}{N-1}$둜 Binomial의 경우처럼 $npq$의 ν˜•νƒœκ°€ λ³΄μ΄μ§€λ§Œ, λ§ˆμ§€λ§‰ 뢀뢄에 $\dfrac{N-n}{N-1}$에 λŒ€ν•œ 텀이 λΆ™λŠ”λ‹€.

Theorem.

νŠΉμ • κ²½μš°μ—μ„œλŠ” HyperGeoλ₯Ό Binomial둜 μ·¨κΈ‰ν•  μˆ˜λ„ μžˆλ‹€.

If $N \gg n$ and $K \gg n$, then

\[h(x; N, K, n) \approx \text{BIN}(x; n, \frac{K}{N})\]

μœ„μ˜ 정리와 λ§ˆμ°¬κ°€μ§€λ‘œ 증λͺ…은 λ’€μ—μ„œ λ”°λ‘œ μ œμ‹œν•˜κ² λ‹€.

Multivariate HyperGeometric Distribution

β€œλ‹€λ³€λŸ‰ μ΄ˆκΈ°ν•˜ 뢄포(Multivariate HyperGeometric Distribution)β€œλŠ” μ΄ˆκΈ°ν•˜ λΆ„ν¬μ—μ„œ κ°€λŠ₯ν•œ Outcome이 2κ°œμ—μ„œ μ—¬λŸ¬ 개둜 λŠ˜μ–΄λ‚œ 상황이닀. pmfλŠ” μ•„λž˜μ™€ κ°™λ‹€.

Definition. Mutlivariate HyperGeometric Distribution

If $N$ items can be partitioned into the $k$ cells $A_1, A_2, \dots, A_k$ with $a_1, a_2, \dots, a_k$ elements, respectively, then the probability distribution of the RVs $X_1, X_2, \dots, X_k$, representing the number of elements selected from $A_1, A_2, \dots, A_k$ in a random sample of size $n$, is

\[f(x_1, \dots, x_k\; ; \; a_1, \dots, a_k, N, n) = \frac{\binom{a_1}{x_1} \cdots \binom{a_k}{x_k}}{\binom{N}{n}}\]

with $\displaystyle \sum^k_{i=1} x_i = n$ and $\displaystyle \sum^k_{i=1} a_i = N$.

pmf ν•¨μˆ˜κ°€ 많이 λ³΅μž‘ν•˜κΈ°λŠ” ν•œλ°, μ΄ˆκΈ°ν•˜ 뢄포λ₯Ό 잘 μ΄ν•΄ν•˜κ³  μžˆλ‹€λ©΄, λ‹€λ³€λŸ‰μœΌλ‘œ ν™•μž₯ν•˜λŠ” 것도 어렡지 μ•Šκ²Œ ν•  수 μžˆλ‹€.

맺음말

μ΄μ–΄μ§€λŠ” ν¬μŠ€νŠΈμ—μ„œλŠ” <Poisson Distribution>λΌλŠ” 이산 ν™•λ₯  λΆ„ν¬μ˜ λ³΄μŠ€κ°€ λ“±μž₯ν•œλ‹€!! Poisson은 μƒλ‹Ήνžˆ μ€‘μš”ν•˜λ‹ˆ λˆˆμ—¬κ²¨ μ‚΄νŽ΄λ³΄λ„λ‘ ν•˜μž!

πŸ‘‰ Poisson Distribution