Discover the Magic of Statistical Sampling: From Candy Store to Atmospheric Physics
This interactive laboratory was inspired by one of the most brilliant explanations of statistical sampling ever written - a Quora answer that perfectly captures why "the size of the population from which you are drawing a sample is basically irrelevant" to the accuracy of your estimates.
Senia Sheydvasser, a freelance mathematician, posted this counter-intuitive insight on Quora that forms the conceptual foundation of our laboratory:
"Here is a counter-intuitive result, which I really wish every single person would learn by heart: when you are trying to gauge how good an estimate a random sample gives you, the size of the population from which you are drawing that sample is basically irrelevant."
Sheydvasser's genius was in creating a physical analogy that makes abstract statistical concepts tangible:
Population: A gigantic mountain of thoroughly mixed Skittles
Key Insight: Size doesn't matter for sampling accuracy
Critical Assumption: Thorough mixing (randomness)
Sample: Your sampling tool for estimation
Key Insight: Larger scoop = better estimates
Critical Point: Scoop doesn't "know" mountain size
"Here is a question. If I want my sample to be more accurate, which of the following will help: doubling the size of my scoop, or halving the size of the mountain? Doubling the size of the scoop will definitely help. But how could halving the size of the mountain possibly make a difference? My scoop doesn't 'know' how large the mountain is."
โ Senia Sheydvasser
Our laboratory takes Sheydvasser's intuitive explanation and builds upon it with:
We implement the exact finite population correction formulas, showing when and why population size does matter (when sampling ratios exceed ~5%)
You can manipulate the mountain size and scoop size to see the mathematical relationships in real-time
Following Kahneman & Tversky's insights about "belief in the law of small numbers"
Extends beyond basic sampling to effect detection and study design
Sheydvasser concluded her answer with a powerful call to action:
"I would really, really like for all of this to become common knowledge. I think absolutely every single person living in an industrialized society should have to take at least one class on statistics, and discussions about how random sampling works should be a mandatory part of the curriculum."
This laboratory is our contribution to that mission. By making abstract statistical concepts visual, interactive, and grounded in memorable analogies, we hope to build the statistical literacy that our data-driven world desperately needs.
Start Your Journey: Click on the ๐งช Laboratory tab to begin exploring the mathematical reality behind the Skittles mountain analogy, or explore the other tabs to dive deep into the foundational concepts of statistical inference.
Four fundamental approaches to sampling and power analysis:
True Population Distribution
Observed Sample Distribution
You're standing in a magical place where mathematics meets the physical world! This laboratory demonstrates the critical relationship between population size, sample size, confidence, precision, and statistical power - embodying the revolutionary insights of Kahneman and Tversky's The Undoing Project.
"Belief in the Law of Small Numbers" - Even expert researchers trusted studies with samples of only n=40, giving them just a 50% chance of finding real effects! They showed that nโ130 was needed for 90% power, revolutionizing how we think about sample size.
This tool helps you avoid their cognitive traps by making the mathematics of power analysis tangible and visual.
"What precision do I achieve?"
Given population size, sample size, and confidence level โ Calculate margin of error
"How many samples do I need?"
Given population size, desired precision, and confidence level โ Calculate required sample size
"What confidence can I achieve?"
Given population size, sample size, and desired precision โ Calculate achievable confidence level
"How many samples do I need to find a real effect?"
Given effect size, desired power, and significance โ Calculate required sample size
Revolutionary Feature: This laboratory implements proper statistical constraint solving! When you change confidence level, the system automatically maintains your margin of error and adjusts sample size accordingly - exactly like professional statistical software.
No more overconstrained parameters or mathematically impossible scenarios!
When sampling from finite populations, the standard error formula becomes:
Where:
The power calculation for detecting effects incorporates Cohen's effect size framework:
Where:
The normal distribution is the beating heart of our sampling adventure. It's why we can make confident statements about our populations based on small samples!
The probability density function of the normal distribution is:
Standard error tells us how much our sample estimates typically vary from the true population value. For finite populations, we must include the finite population correction!
For finite populations, the standard error is:
Two Components:
A confidence interval creates a "zone of trust" around your sample estimate. It's the answer to: "Given my sample, where is the true population parameter likely to be?"
Confidence Interval Formula for Finite Populations:
$$CI = \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \sqrt{\frac{N-n}{N-1}}$$Components Explained:
$\alpha = 0.10$
z = 1.645
5% in each tail
$\alpha = 0.05$
z = 1.960
2.5% in each tail
$\alpha = 0.01$
z = 2.576
0.5% in each tail
$\alpha = 0.001$
z = 3.291
0.05% in each tail
"There's a 95% chance the true proportion is in this specific interval [0.23, 0.27]."
Why wrong: The true proportion is a fixed value - it either is or isn't in this interval. There's no probability about it.
"If we repeated this sampling process many times, about 95% of the resulting intervals would contain the true proportion."
Why correct: The confidence refers to the long-run performance of the method, not any individual interval.
Step 1: Start with standardized statistic
$$Z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}} \sqrt{\frac{N-n}{N-1}}} \sim N(0,1)$$Step 2: Create probability statement
$$P\left(-z_{\alpha/2} \leq \frac{\hat{p} - p}{SE} \leq z_{\alpha/2}\right) = 1-\alpha$$Step 3: Solve for p (algebraic manipulation)
$$P\left(\hat{p} - z_{\alpha/2} \cdot SE \leq p \leq \hat{p} + z_{\alpha/2} \cdot SE\right) = 1-\alpha$$Step 4: Final confidence interval
$$CI: \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \sqrt{\frac{N-n}{N-1}}$$Problem: "What percentage of products are defective?"
Sample: 500 products, 23 defective
Result: $\hat{p} = 0.046$ (4.6%)
95% CI: [3.2%, 6.0%]
Conclusion: We're 95% confident the true defect rate is between 3.2% and 6.0%
Problem: "What percentage support the candidate?"
Sample: 1,200 voters, 612 support
Result: $\hat{p} = 0.51$ (51%)
95% CI: [48.1%, 53.9%]
Conclusion: The race is too close to call (includes 50%)
Experiment 1: Confidence vs Width
Set sample size to 100 and slowly increase confidence from 90% to 99%. Notice how intervals get wider with higher confidence - this is the precision trade-off!
Experiment 2: Sample Size vs Precision
Set confidence to 95% and increase sample size from 50 to 500. Watch intervals get narrower - larger samples give more precision.
Experiment 3: Finite Population Effect
With sample size 100, change population from 4,096 to 256. See how finite population correction dramatically narrows intervals when sampling a large fraction of the population.
Experiment 4: Coverage Rate
Set to 95% confidence and generate many intervals. Count how many contain the true value (red line). It should be close to 95%!
The Central Limit Theorem works for both infinite and finite populations, but the finite case has some beautiful properties!
For finite populations: The sampling distribution of sample means approaches:
$$\bar{X}_n \sim N\left(\mu, \frac{\sigma^2}{n} \cdot \frac{N-n}{N-1}\right)$$Key insight: The variance decreases faster than the infinite population case due to the finite population correction!
The z-score calculation must account for finite population effects when the sample represents a significant portion of the population.
Symbol Definitions: