HyperGeometric Distribution
βνλ₯ κ³Ό ν΅κ³(MATH230)β μμ μμ λ°°μ΄ κ²κ³Ό 곡λΆν κ²μ μ 리ν ν¬μ€νΈμ λλ€. μ 체 ν¬μ€νΈλ Probability and Statisticsμμ νμΈνμ€ μ μμ΅λλ€ π²
μ΄μ ν¬μ€νΈμμ μ΄μ° λΆν¬μ κΈ°λ³Έμ΄ λλ <Bernoulli Distribution>, <Binomial Distribution> λ±λ±μ μ΄ν΄λ΄€λ€. μ΄λ² ν¬μ€νΈμμλ μ’λ μ¬λ―Έμλ λΆν¬λ€μ΄ λ±μ₯νλ€!
HyperGeometric Distribution
<HyperGeometric Distribution>μ μμμ μ΄ν΄λ³Έ <Binomial Distribution>κ³Ό μν©μ΄ μ λ§ λΉμ·νλ€. νμ§λ§, Sampling λ°©μμμ μ°¨μ΄κ° μλ€.
- <Binomial Distribution>μ κ° trialμ΄ λ 립μ μ΄κ³ , with replacement μλ€.
- λ°λ©΄μ <HyperGeometric Distribution>μ κ° trialμ΄ dependentνκ³ w/o replacementλ‘ μ§νλλ€!
w/o replacement λ°©μμΌλ‘ μνλ§νλ κ²μ μμλ <Acceptance Sampling>μ΄ μλ€. λ¬Όνμ νμ§μ κ²μνλ μ΄ μμ μ ν μ€ν νμ λ¬Όνμ΄ νκ΄΄λκ±°λ λμ΄μ μ°μ§ λͺ»νκ² λκΈ° λλ¬Έμ κ΅μ²΄λ₯Ό ν μκ° μλ€. κ·Έλ κΈ° λλ¬Έμ w/o replacementλ₯Ό λ°νμΌλ‘ νλ μνλ§μ λν λ Όμλ κΌ νμνλ€.
Definition. HyperGeometric Distribution
μ±κ³΅μΌλ‘ νμλ $K$κ°μ μνκ³Ό μ€ν¨λ‘ νμλ $N-K$κ°μ μνμ΄ μλ $N$κ°μ μνμμ, 무μμλ‘ $n$κ°μ μνμ w/o replacementλ‘ λ½λλ€κ³ νμ. μ΄κ²μ <HyperGeometric Experiment>λΌκ³ νλ€. μ΄λ, RV $X$λ <HyperGeometric Experiment>μμ μ±κ³΅μ λ½μ νμμ΄λ€. μ΄ RV $X$λ₯Ό <HyperGeometric RV>λΌκ³ νλ€.
<HyperGeometric RV> $X$μ pmfλ μλμ κ°μ΄ μ μλλ€.
\[h(x; N, K, n) = \frac{\binom{K}{x} \binom{N-K}{n-x}}{\binom{N}{n}} \quad \text{where} \quad 0 \le x \le K \quad \text{and} \quad 0 \le n-x \le N-K\]μμ κ°μ pmfλ₯Ό <HyperGeometric Distribution>λΌκ³ νλ©°, $X \sim \text{HyperGeo}(N, K, n)$λ‘ νκΈ°νλ€.
μ΄λ, <HyperGeometric Distribution>μ λν 쑰건μμ λ€λ¬μΌλ©΄ μλμ κ°λ€.
\[\begin{aligned} \quad 0 \le x \le K \quad &\text{and} \quad 0 \le n-x \le N-K \\ \quad 0 \le x \le K \quad &\text{and} \quad -n \le -x \le N-K-n \\ \quad 0 \le x \le K \quad &\text{and} \quad K+n - N \le x \le n \\ \end{aligned}\] \[\therefore \max \{ 0, n-(N-K) \} \le x \le \min \{ K, n \}\]Theorem.
Let $X \sim \text{HyperGeo}(N, K, n)$, then
- $\displaystyle E[X] = n \frac{K}{N}$
- $\displaystyle \text{Var}(X) = n \frac{K}{N}\left( 1 - \frac{K}{N} \right) \cdot \frac{N-n}{N-1}$
μ§κΈ λΉμ₯ <HyperGeometric Distribution>μ λν νκ· κ³Ό λΆμ°μ λν μ 리λ₯Ό μ¦λͺ νμ§λ μμ κ²μ΄λ€. κ·Έλ¬λ μμ μμ μ’λ μ§κ΄μ μΌλ‘ μ΄ν΄ν΄λ³΄λ©΄, <Binomial Distribution>μ κ²½μ°μ μ λ§ μ μ¬ν¨μ λ°κ²¬ν μ μλ€.
HyperGeoμ $\dfrac{K}{N}$λ₯Ό Binomialμ $p$λ‘ ν΄μνλ€λ©΄, Binomialμ νκ· μΈ $np$μ HpyerGeomμ $n\dfrac{K}{N}$λ κ·Έ ννκ° κ½€ λΉμ·νλ€. λΆμ°μ κ²½μ°μλ HyperGeoμ κ²½μ° $n \dfrac{K}{N}\left( 1 - \dfrac{K}{N} \right) \cdot \dfrac{N-n}{N-1}$λ‘ Binomialμ κ²½μ°μ²λΌ $npq$μ ννκ° λ³΄μ΄μ§λ§, λ§μ§λ§ λΆλΆμ $\dfrac{N-n}{N-1}$μ λν ν μ΄ λΆλλ€.
Theorem.
νΉμ κ²½μ°μμλ HyperGeoλ₯Ό Binomialλ‘ μ·¨κΈν μλ μλ€.
If $N \gg n$ and $K \gg n$, then
\[h(x; N, K, n) \approx \text{BIN}(x; n, \frac{K}{N})\]μμ μ 리μ λ§μ°¬κ°μ§λ‘ μ¦λͺ μ λ€μμ λ°λ‘ μ μνκ² λ€.
Multivariate HyperGeometric Distribution
βλ€λ³λ μ΄κΈ°ν λΆν¬(Multivariate HyperGeometric Distribution)βλ μ΄κΈ°ν λΆν¬μμ κ°λ₯ν Outcomeμ΄ 2κ°μμ μ¬λ¬ κ°λ‘ λμ΄λ μν©μ΄λ€. pmfλ μλμ κ°λ€.
Definition. Mutlivariate HyperGeometric Distribution
If $N$ items can be partitioned into the $k$ cells $A_1, A_2, \dots, A_k$ with $a_1, a_2, \dots, a_k$ elements, respectively, then the probability distribution of the RVs $X_1, X_2, \dots, X_k$, representing the number of elements selected from $A_1, A_2, \dots, A_k$ in a random sample of size $n$, is
\[f(x_1, \dots, x_k\; ; \; a_1, \dots, a_k, N, n) = \frac{\binom{a_1}{x_1} \cdots \binom{a_k}{x_k}}{\binom{N}{n}}\]with $\displaystyle \sum^k_{i=1} x_i = n$ and $\displaystyle \sum^k_{i=1} a_i = N$.
pmf ν¨μκ° λ§μ΄ 볡μ‘νκΈ°λ νλ°, μ΄κΈ°ν λΆν¬λ₯Ό μ μ΄ν΄νκ³ μλ€λ©΄, λ€λ³λμΌλ‘ νμ₯νλ κ²λ μ΄λ ΅μ§ μκ² ν μ μλ€.
λ§Ίμλ§
μ΄μ΄μ§λ ν¬μ€νΈμμλ <Poisson Distribution>λΌλ μ΄μ° νλ₯ λΆν¬μ 보μ€κ° λ±μ₯νλ€!! Poissonμ μλΉν μ€μνλ λμ¬κ²¨ μ΄ν΄λ³΄λλ‘ νμ!
π Poisson Distribution