What is Hypergeometric Distribution?
Imagine you have a box filled with colored balls: some are red, and some are blue. You want to know the probability of picking a specific number of red balls if you draw a few balls without putting them back in the box. This scenario, where you sample without replacement, is where the hypergeometric distribution comes into play!
The Hypergeometric distribution describes the probability of k successes in n draws from a finite population of size N that contains exactly K successes, without replacement.
The probability mass function (PMF) of the Hypergeometric distribution is given by
Examples of the Hypergeometric Distribution
Quality Control in Manufacturing:
- Scenario: A factory produces a batch of 1000 widgets, out of which 50 are defective. An inspector randomly selects 20 widgets for inspection.
- Application: The hypergeometric distribution can be used to determine the probability that a certain number of defective widgets are found in the sample.
Lotteries and Gambling:
- Scenario: In a lottery, there are 50 balls numbered 1 through 50, and 6 balls are drawn without replacement.
- Application: The hypergeometric distribution is used to calculate the probability of matching a certain number of chosen numbers.
Visualization
Scenario
A quality control inspector randomly selects 10 items from a shipment of 100 items. Among these 100 items, 20 are known to be defective. What is the probability that exactly 3 out of the 10 selected items are defective?
Solution
From the formula of the probability mass function (PMF) of the Hypergeometric distribution, we can say,
- N is the population size (100 items).
- K is the number of successes in the population (20 defective items).
- n is the number of draws (10 items).
- k is the number of observed successes (3 defective items).
Given
We need to find P(X=3).
So, the probability that exactly 3 out of the 10 selected items are defective is approximately 0.2054 or 20.54%.
Verification Using Python
import scipy.stats as stats# Parameters
N = 100 # population size
K = 20 # number of defective items in the population
n = 10 # number of items drawn
k = 3 # number of defective items we want to find the probability for
# Calculate the probability
probability = stats.hypergeom.pmf(k, N, K, n)
probability
Executing this code will give us the probability that exactly 3 out of the 10 selected items are defective.
Why Does This Matter?
Understanding the hypergeometric distribution is crucial in many fields, including quality control, ecological studies, and even card games, where the outcome is based on sampling without replacement. It helps us calculate probabilities in scenarios where each draw significantly affects the outcomes of the subsequent draws.
If you enjoyed this article, feel free to follow me for more insights and updates.
LinkedIn GitHub