Sunblock Experimental Design Simulator

Understanding Experimental Design

The Basic 2×2 Design (4 Groups)

A well-designed experiment requires more than just testing whether sunblock prevents sunburn. We need to establish baselines and control for confounding factors. The simplest rigorous design uses four groups:

🧪 Experimental Group

Sun + Sunblock

This is the primary test: Does the sunblock actually protect against sunburn when exposed to the sun?

🔥 Positive Control

Sun + No Sunblock

Confirms that the sun exposure is sufficient to cause burns. This establishes a baseline for expected damage and validates our experimental conditions.

✅ Negative Control

No Sun + No Sunblock

Confirms that subjects don't spontaneously develop burns without sun exposure. This establishes the natural baseline state.

🛡️ Safety Control

No Sun + Sunblock

Tests whether the sunblock itself causes any adverse skin reactions (redness, irritation) in the absence of sun. Critical for product safety and quality control!

Why Controls Matter for Quality Control

Each control group serves a specific quality control purpose:

Positive Control: Validates that your experimental setup is working. If subjects with no sunblock don't get burned, your UV source is faulty.
Negative Control: Ensures measurement accuracy. If subjects get "burned" with no sun exposure, your burn assessment method is flawed.
Safety Control: Critical for regulatory approval. The FDA requires proof that the product itself doesn't cause harm.

Expanding to 2×3 Design (6 Groups)

By introducing a "Half Sun" exposure level, we add three more groups. This expansion serves important quality control functions:

Validates dose-response relationships (does more sun cause more burns?)
Tests sunblock effectiveness across different UV intensities
Provides intermediate checkpoints for quality assurance
Helps identify non-linear effects or threshold behaviors

The six groups now include all combinations of {No Sun, Half Sun, Full Sun} × {No Sunblock, Full Sunblock}.

The Full 3×3 Design (9 Groups) - Complete Quality Control

Adding "Half Dose Sunblock" creates our complete nine-group design. This provides comprehensive quality control coverage:

Dose-Response Validation: Does half the dose provide roughly half the protection? Deviations indicate formulation issues.
Manufacturing Consistency: If half-dose performance varies widely, it suggests quality control problems in production.
Usage Guidance: Many users under-apply sunblock. This tests real-world scenarios.
Regulatory Compliance: FDA requires testing across multiple dose levels to establish safe usage ranges.
Complete Interaction Mapping: Identifies if sunblock effectiveness depends on exposure level (interaction effects).

Quality Control Principle: The 3×3 design provides nine independent measurements. If any group shows unexpected results, it immediately flags a potential problem with experimental setup, product formulation, or measurement methodology.

Statistical Note: With nine groups, we need a sufficiently large sample size in each group to detect real effects. Underpowered experiments can miss true effects or produce misleading results by chance, undermining quality control objectives.

The Mathematics of Experimental Design

Factorial Design:

A k×m factorial design has k levels of factor A and m levels of factor B, creating k × m = total groups.

For our 3×3 design: 3 sun levels × 3 sunblock levels = 9 groups

Total subjects needed = n × (k × m), where n is the sample size per group

Main Effects:

$$\text{Effect of Sunblock} = \bar{Y}_{\text{with sunblock}} - \bar{Y}_{\text{without sunblock}}$$

$$\text{Effect of Sun} = \bar{Y}_{\text{with sun}} - \bar{Y}_{\text{without sun}}$$

where $\bar{Y}$ represents the mean burn rate across all relevant groups.

Interaction Effects:

An interaction exists when the effect of one factor depends on the level of another factor.

$$\text{Interaction} = (\bar{Y}_{11} - \bar{Y}_{10}) - (\bar{Y}_{01} - \bar{Y}_{00})$$

If this value is significantly different from zero, the factors interact. For example, sunblock might be highly effective under full sun but provide minimal benefit under half sun.

Example Calculation:

If we find:

Full Sun + No Sunblock: 95% burn rate
Full Sun + Full Sunblock: 5% burn rate
Main effect of sunblock = 5% - 95% = -90% (reduces burns by 90 percentage points)

With n=50 per group and 95% confidence, our margin of error is ±13.9%, so we can confidently conclude the sunblock is effective.

Classification Metrics & ROC Curves

Understanding True/False Positives and Negatives

In diagnostic testing and experimental validation, we classify outcomes into four categories based on what we expect versus what actually happens.

The Four Outcomes Explained:

True Positive (TP)

Definition: We expected a positive outcome, and we observed a positive outcome.

In sunblock testing: We expected a burn (sufficient sun exposure without adequate protection), and a burn occurred.

Interpretation: This confirms our prediction was correct. The sunblock failed as expected, or there was insufficient protection.

Example: A subject exposed to full sun with no sunblock gets burned. We predicted this would happen, and it did.

Formula Context: TP appears in the numerator of sensitivity, showing how many expected burns actually occurred.

True Negative (TN)

Definition: We expected a negative outcome (no event), and we observed no event.

In sunblock testing: We expected no burn (either no sun exposure or adequate protection), and no burn occurred.

Interpretation: This confirms our prediction was correct. Either the sunblock worked, or there was no threatening exposure.

Example: A subject with no sun exposure and no sunblock has healthy skin. We predicted no burn, and there was none.

Formula Context: TN appears in the numerator of specificity, showing correct identification of non-burn cases.

False Positive (FP)

Definition: We expected a negative outcome, but we observed a positive outcome.

In sunblock testing: We expected no burn, but a burn occurred anyway.

Interpretation: Something unexpected happened. This might indicate an adverse reaction to the sunblock itself, or contamination in the experimental setup.

Example: A subject with no sun exposure but full sunblock develops redness. This is alarming—the sunblock itself might be causing irritation!

Formula Context: FP appears in the denominator of specificity. High FP reduces specificity, indicating poor safety.

False Negative (FN)

Definition: We expected a positive outcome, but we observed a negative outcome.

In sunblock testing: We expected a burn, but no burn occurred.

Interpretation: The intervention worked! The sunblock successfully prevented a burn that we expected would happen.

Example: A subject exposed to full sun with full sunblock remains burn-free. This is the desired outcome—the sunblock protected them.

Note: In medical testing, "false negative" usually means a bad outcome (missing a disease). Here, it's actually good—we "missed" predicting a burn because the sunblock worked!

Formula Context: FN appears in the denominator of sensitivity. High FN means low sensitivity (good for sunblock—it prevented many expected burns).

The Confusion Matrix

We organize these four outcomes into a confusion matrix:

True Positive (TP)
Expected burn → Got burn
(Sunblock failed or absent)

False Positive (FP)
No burn expected → Got burn
(Adverse reaction!)

False Negative (FN)
Expected burn → No burn
(Sunblock worked! ✓)

True Negative (TN)
No burn expected → No burn
(Normal baseline)

Sensitivity and Specificity: Detailed Explanation

Sensitivity (Recall / True Positive Rate):

$$\text{Sensitivity} = \frac{TP}{TP + FN}$$

What it measures: Of all the cases where we expected a positive outcome (burn), what fraction actually had a positive outcome?

In sunblock testing: Of all exposures expected to cause burns, what percentage actually resulted in burns? Low sensitivity is good—it means the sunblock is preventing most expected burns (high FN count).

Range: 0 to 1 (or 0% to 100%)

Example: If sensitivity = 0.05 (5%), then only 5% of expected burns actually occur. The sunblock prevents 95% of expected burns.

Why it matters: Sensitivity tells us how "leaky" our protection is. In medical diagnostics, high sensitivity means we catch most cases. In sunblock, low sensitivity paradoxically means good protection—we're preventing most burns that would have occurred.

Specificity (True Negative Rate):

$$\text{Specificity} = \frac{TN}{TN + FP}$$

What it measures: Of all the cases where we expected a negative outcome (no burn), what fraction actually had no burn?

In sunblock testing: Of all cases where we didn't expect burns (no sun or adequate protection), what percentage correctly had no burns? High specificity means the sunblock doesn't cause problems—it doesn't create burns on its own.

Range: 0 to 1 (or 0% to 100%)

Example: If specificity = 0.99 (99%), then 99% of no-burn-expected cases correctly had no burns. Only 1% had unexpected burns (likely adverse reactions).

Why it matters: Specificity measures safety and precision. High specificity means few false alarms—the sunblock doesn't cause unexpected problems. In medical testing, high specificity means few healthy people are incorrectly diagnosed as sick.

The Tradeoff: Sensitivity vs. Specificity

There's often a tradeoff between sensitivity and specificity. In sunblock testing:

High Sensitivity, Low Specificity: We're very conservative—we predict burns often, so most actual burns are "caught" (TP high), but we also have many false alarms (FP high). We might label mild redness as "burns."
Low Sensitivity, High Specificity: We're very strict—we only call it a "burn" if it's severe. We miss some mild burns (FN high), but we rarely mistake healthy skin for burned (FP low).

ROC Curves: Visualizing the Tradeoff

A Receiver Operating Characteristic (ROC) curve plots sensitivity (True Positive Rate) on the y-axis versus 1-specificity (False Positive Rate) on the x-axis as we vary a decision threshold.

What is a threshold? For sunblock, the threshold might be "minimum SPF effectiveness." If we set a high threshold (demand very strong protection), we'll have:

Low sensitivity: Few cases meet our strict criteria
High specificity: We're very confident when we do approve something

If we set a low threshold (accept weaker protection), we'll have:

High sensitivity: More cases meet our lenient criteria
Low specificity: But we might approve things that don't really work

Interactive ROC Curve

Protection Threshold

50% SPF effectiveness required

Sensitivity (TPR)

0.50

Specificity (TNR)

0.50

AUC

0.75

Understanding the AUC (Area Under the Curve)

AUC Interpretation:

$$0.5 \leq \text{AUC} \leq 1.0$$

AUC = 1.0: Perfect classifier. The test perfectly distinguishes between positive and negative cases at some threshold.
AUC = 0.9-0.99: Excellent performance. The sunblock is highly effective.
AUC = 0.8-0.89: Good performance. Clinically useful.
AUC = 0.7-0.79: Fair performance. Some predictive value.
AUC = 0.5: Random guessing. No better than chance. The diagonal line on the ROC plot represents this.

In sunblock testing: An AUC above 0.85 typically indicates the sunblock provides meaningful protection across various exposure levels and dosages.

Mathematical meaning: AUC represents the probability that a randomly chosen positive case (expected burn) ranks higher than a randomly chosen negative case (no burn expected) according to our scoring function.

Key Insight: The ROC curve and AUC summarize performance across all possible decision thresholds. A single sensitivity/specificity pair tells you performance at one threshold. The AUC tells you overall quality—how well the sunblock performs regardless of how strictly you define "adequate protection."

Mathematical Foundations

Relationship to Precision and Recall:

Sensitivity is the same as Recall in machine learning contexts:

$$\text{Recall} = \text{Sensitivity} = \frac{TP}{TP + FN}$$

Precision (Positive Predictive Value) is different:

$$\text{Precision} = \frac{TP}{TP + FP}$$

Precision asks: "Of all the cases we predicted as positive, how many were actually positive?"

F1 Score (Harmonic Mean):

$$F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

The F1 score balances precision and recall, useful when you need a single metric that captures both.

Statistical Power and Sample Size

The Law of Small Numbers

Small samples can be highly misleading. Imagine testing sunblock on just 5 people. Even if it works perfectly, you might see 1-2 burns by pure chance (perhaps they had pre-existing skin sensitivity). Conversely, a useless product might appear effective in a small trial just by luck.

Real-World Example: A study with only 10 subjects per group found that a sunblock "reduced burns by 80%!" but the confidence interval was so wide (20% to 95% reduction) that the result was essentially meaningless. The effect could have been anywhere from trivial to excellent.

Interactive Sample Size Calculator

Statistical power is the probability of detecting a true effect when one exists. Larger samples increase power, but with diminishing returns. The formulas below account for finite population correction when your sample is a substantial fraction of the total population.

Population Size (N)

5000 people

Sample Size (n)

200 people

Confidence Level

95%

Expected Effect Size (Burn Rate Reduction)

20% reduction

Sampling Fraction

4.0%

Margin of Error

±6.9%

Statistical Power

82%

The Mathematics of Sample Size

Standard Error (Infinite Population):

$$SE = \sqrt{\frac{p(1-p)}{n}}$$

where:

$p$ = expected proportion (e.g., burn rate, typically 0.5 for maximum variance)
$n$ = sample size

Interpretation: The standard error measures how much sample proportions vary from the true population proportion. Larger samples have smaller standard errors—they give more precise estimates.

Why p=0.5 maximizes variance: The variance of a proportion is p(1-p), which is maximized when p=0.5. This gives us the most conservative (largest) standard error estimate.

Finite Population Correction (FPC):

$$\text{FPC} = \sqrt{\frac{N-n}{N-1}}$$

$$SE_{\text{corrected}} = SE \times \text{FPC}$$

where:

$N$ = total population size
$n$ = sample size

When to use: When $n/N > 5\%$ (sampling more than 5% of the population), this correction becomes important. It reduces the standard error because we're sampling a significant fraction of the entire population.

Why it matters: If you're testing 100 people from a population of 200, your estimates are much more precise than if you're testing 100 from a population of 100,000. The FPC accounts for this.

Limiting case: If n=N (you sample the entire population), FPC=0, and SE=0. You have perfect information with no sampling error.

Margin of Error (MOE):

$$\text{MOE} = z_{\alpha/2} \times SE_{\text{corrected}}$$

where $z_{\alpha/2}$ is the critical value from the standard normal distribution:

90% confidence: $z = 1.645$
95% confidence: $z = 1.96$
99% confidence: $z = 2.576$

Interpretation: We're 95% confident (if using 95% confidence level) that the true population value falls within our sample estimate ± MOE.

Example: If we measure a 30% burn rate with MOE = ±7%, we're 95% confident the true burn rate is between 23% and 37%.

Why α/2: We split α (e.g., 0.05 for 95% confidence) between the two tails of the distribution, so each tail has α/2 = 0.025.

Statistical Power (Simplified):

$$\text{Power} = 1 - \beta$$

where $\beta$ is the probability of Type II error (failing to detect a real effect).

For proportion tests, power can be approximated as:

$$\text{Power} \approx \Phi\left(\frac{|\text{Effect Size}| - z_{\alpha/2} \cdot SE}{SE}\right)$$

where:

$\Phi$ = cumulative distribution function of the standard normal distribution
Effect Size = expected difference in proportions (e.g., 0.20 for 20% reduction)
$z_{\alpha/2}$ = critical value for desired confidence level

Interpretation: Power is the probability that we'll detect an effect if it truly exists. Higher power means we're less likely to miss real effects.

Factors that increase power:

Larger sample size ($n\uparrow \Rightarrow SE\downarrow \Rightarrow \text{Power}\uparrow$)
Larger effect size (bigger differences are easier to detect)
Lower confidence level (but this increases false positive risk)
Lower population variability (but usually not controllable)

Practical Sample Size Guidelines

Rule of Thumb for Sunblock Testing:

For 80% power to detect a 20% effect:

Minimum n ≈ 40-50 per group (95% confidence)
Recommended n ≈ 75-100 per group (accounts for dropouts, non-compliance)

For 90% power to detect a 15% effect:

Minimum n ≈ 100-120 per group
Recommended n ≈ 150-200 per group

For detecting small effects (5-10%):

Minimum n ≈ 300-400 per group
May require n > 500 per group for 90% power

Cost-Benefit Analysis: For the 3×3 design with n=50 per group:

Total subjects = 9 × 50 = 450
Total cost = 450 × $100 = $45,000
Margin of error ≈ ±14% per group
Power to detect 20% effect ≈ 75-80%

Doubling to n=100 would cost $90,000 but reduce MOE to ±10% and increase power to ~95%.

Common Mistakes:

Underpowering: Using too small a sample and concluding "no effect" when you simply lacked the sensitivity to detect it.
p-hacking: Running multiple small studies until one "works" by chance.
Ignoring multiple comparisons: With 9 groups, you're making many comparisons. Adjust your significance threshold accordingly (e.g., Bonferroni correction: use α/9 instead of α).
Confusing statistical and practical significance: With huge samples, tiny effects become "statistically significant" but may not be practically meaningful.

Advanced Concepts

Sample Size Formula (Solving for n):

To achieve desired power for detecting a specific effect size:

$$n = \frac{(z_{\alpha/2} + z_\beta)^2 \cdot [p_1(1-p_1) + p_2(1-p_2)]}{(p_1 - p_2)^2}$$

where:

$p_1$ = proportion in group 1 (e.g., burn rate without sunblock)
$p_2$ = proportion in group 2 (e.g., burn rate with sunblock)
$z_{\alpha/2}$ = critical value for confidence level (e.g., 1.96 for 95%)
$z_\beta$ = critical value for power (e.g., 0.84 for 80% power, 1.28 for 90% power)

An Abstract View: Finding Effects as Polynomial Roots

Here's an elegant mathematical analogy: Finding a real effect in an experiment is like finding where a polynomial function crosses the x-axis (its real roots). The function represents your measurement as you vary experimental conditions, and a "crossing" represents a detectable effect.

The Analogy Explained

Consider a polynomial function of degree 4:

$$f(x) = ax^4 + bx^3 + cx^2 + dx + e$$

where $a, b, c, d, e$ are coefficients that determine the shape of the curve.

Domain: $x \in \mathbb{R}$ (all real numbers)

Range: $f(x) \in \mathbb{R}$

Roots: Values of $x$ where $f(x) = 0$

Connecting Polynomials to Experimental Effects:

Real Roots = Detectable Effects

Mathematical meaning: A real root is a point $x_0 \in \mathbb{R}$ where $f(x_0) = 0$. The graph crosses the x-axis at this point.

Experimental meaning: A real root represents a condition where your intervention makes a measurable, observable difference. The outcome crosses a threshold from "no effect" to "clear effect."

Example: In sunblock testing, a real root might represent the minimum SPF value where burn prevention becomes statistically significant. Below this threshold, burns occur; above it, they don't.

Visualization: On a graph of "burn rate vs. sunblock SPF," a real root is where the curve crosses the "acceptable burn rate" threshold.

Complex Roots = Undetectable Effects

Mathematical meaning: A complex root exists in $\mathbb{C}$ (the complex number plane) but not in $\mathbb{R}$. It has the form $z = a + bi$ where $i = \sqrt{-1}$. The function never crosses the x-axis—it stays entirely above or below it.

Experimental meaning: A complex root represents "no observable effect." Your intervention might have some theoretical influence, but it never manifests as a detectable, measurable change in the real world.

Example: A sunblock with complex roots would never show statistically significant protection, no matter how you adjust the dosage or exposure—it simply doesn't work in observable reality.

Mathematical property: Complex roots of polynomials with real coefficients always come in conjugate pairs: if $a + bi$ is a root, then $a - bi$ is also a root.

Near-Misses = Underpowered Experiments

Mathematical meaning: The function comes very close to zero ($|f(x)| < \epsilon$ for small $\epsilon$) but doesn't quite cross. Mathematically, there might be real roots very close by with slightly different coefficients.

Experimental meaning: You almost detected an effect. With a larger sample size (more precision in measuring the function), you might have caught it. This is the signature of an underpowered experiment.

Example: Your sunblock reduces burns from 50% to 35%, but with n=20, the confidence interval is ±18%, so the result is "not significant." With n=100, you'd detect it clearly—the function would cross zero.

Numerical analysis parallel: Just as numerical root-finding requires sufficient precision (small step size), detecting effects requires sufficient statistical power (large sample size).

Why This Analogy Matters

Sensitivity to Parameters: Just as changing polynomial coefficients can create or destroy roots, changing experimental parameters (sample size, measurement precision, control of confounds) can determine whether effects become detectable.

Example: Small coefficient changes: $f(x) = x^4 - 4x^2$ has two real roots at $x = \pm\sqrt{2}$, but $f(x) = x^4 - 4x^2 + 1$ has no real roots—just by adding +1!

Experimental parallel: Increasing n from 40 to 60 might be the difference between detecting and missing a 15% effect—a small change with big consequences.

Interactive Polynomial Visualization

Adjust the coefficients below and watch how the number of real roots changes. Notice how small changes in parameters can make effects appear or disappear—just like how small changes in experimental design or sample size can determine whether you detect a real effect.

Coefficient a (x⁴)

1.0

Coefficient b (x³)

0.0

Coefficient c (x²)

-4.0

Coefficient d (x)

0.0

Coefficient e (constant)

0.0

Real Roots (Effects Detected)

Complex Roots (No Observable Effect)

Loading equation...

The Mathematics of Polynomial Roots

Fundamental Theorem of Algebra:

A polynomial of degree $n$ has exactly $n$ roots (counting multiplicity) in the complex number system $\mathbb{C}$.

$$\deg(f) = n \implies f \text{ has } n \text{ roots in } \mathbb{C}$$

For our degree-4 polynomial: Total roots = 4 (some may be real, some may be complex)

$$\text{Real Roots} + \text{Complex Roots} = 4$$

Important property: Complex roots of polynomials with real coefficients always come in conjugate pairs. So for a degree-4 polynomial with real coefficients, you can have:

4 real roots, 0 complex roots
2 real roots, 2 complex roots (1 conjugate pair)
0 real roots, 4 complex roots (2 conjugate pairs)

Why conjugate pairs? If $z = a + bi$ is a root, then $f(z) = 0$. Taking the complex conjugate: $\overline{f(z)} = \overline{0} = 0$. Since coefficients are real, $\overline{f(z)} = f(\overline{z})$, so $f(a - bi) = 0$. Thus $\overline{z} = a - bi$ is also a root.

Finding Roots Numerically:

For most polynomials beyond degree 2, we can't solve for roots analytically. We use numerical methods:

Newton-Raphson Method:

$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$$

Starting from an initial guess $x_0$, this iteratively refines the estimate until $|f(x_n)| < \epsilon$ for some tolerance $\epsilon$.

Derivative for our polynomial:

$$f'(x) = 4ax^3 + 3bx^2 + 2cx + d$$

Experimental parallel: Just as Newton's method requires good initial estimates and sufficient iterations, experiments need good pilot studies to estimate effect sizes and adequate sample sizes to converge on the truth.

Convergence rate: Newton's method has quadratic convergence near roots—the number of correct digits roughly doubles each iteration. Similarly, doubling sample size roughly doubles precision (halves standard error).

Connecting to Experimental Design

Multiple Effects: A polynomial can have multiple real roots, just as an intervention can have multiple detectable effects at different parameter values. Our 3×3 sunblock design reveals effects at different dose/exposure combinations—multiple "crossings" of the effectiveness threshold.

Example: $f(x) = (x-1)(x-2)(x-3)(x-4) = x^4 - 10x^3 + 35x^2 - 50x + 24$ has four real roots at x=1,2,3,4. In experimental terms, this represents four distinct conditions where an effect is detectable.

Continuous vs. Discrete: The polynomial is continuous (smoothly varying), while our 9-group design samples at discrete points. This is why factorial designs are powerful—they sample the "function" at strategic points to characterize its overall behavior.

Interpolation insight: With measurements at 9 strategically chosen points, we can fit a polynomial model and interpolate between them, predicting effects at untested combinations.

The Reality of Effects: A real root exists whether or not we can compute it accurately. Similarly, a true effect exists whether or not our experiment detects it. Increasing sample size is like increasing the resolution of our root-finding algorithm—we get closer to the truth.

Philosophical point: Just as $\pi$ has infinite decimal digits but we only compute finitely many, true effects exist with infinite precision, but our experiments only measure them approximately.

Thought Experiment: The Vanishing Effect

Imagine This Scenario:

You test a sunblock with n=30. You find a 12% reduction in burns, but it's "not significant" (p=0.08). You conclude "no effect."

A competitor tests the same sunblock with n=120. They find a 13% reduction and p=0.002, "highly significant."

What happened? The effect (the "real root") was always there. Your n=30 study had insufficient precision—like trying to find a root with a low-resolution graph. The function came close to zero, but you couldn't definitively say it crossed.

With n=120, the precision increased, and the crossing became clear. The effect didn't change—your ability to detect it did.

Polynomial analogy: It's like having $f(x) = x^4 - 4x^2 + 0.1$ and trying to tell if there are real roots near $x = \pm\sqrt{2 - \sqrt{0.1}}$. With coarse numerical precision, you might miss them. With fine precision, they become visible.

Mathematical lesson: The roots of $x^4 - 4x^2 = 0$ are clearly at $x = 0, \pm\sqrt{2}$. But for $x^4 - 4x^2 + 0.1 = 0$, numerical methods are required, and low precision might incorrectly conclude "no real roots" when they actually exist.

This mathematical framework reminds us that experimental design is fundamentally about creating conditions where true effects become visible—where the "function" of our measurements crosses the threshold of detectability with sufficient clarity and confidence.

Experiment Statistics

Experimental Groups & Results

Understanding Experimental Design

The Basic 2×2 Design (4 Groups)

🧪 Experimental Group

🔥 Positive Control

✅ Negative Control

🛡️ Safety Control

Why Controls Matter for Quality Control

Expanding to 2×3 Design (6 Groups)

The Full 3×3 Design (9 Groups) - Complete Quality Control

The Mathematics of Experimental Design

Example Calculation:

Classification Metrics & ROC Curves

Understanding True/False Positives and Negatives

The Four Outcomes Explained:

True Positive (TP)

True Negative (TN)

False Positive (FP)

False Negative (FN)

The Confusion Matrix

Sensitivity and Specificity: Detailed Explanation

The Tradeoff: Sensitivity vs. Specificity

ROC Curves: Visualizing the Tradeoff

Interactive ROC Curve

Understanding the AUC (Area Under the Curve)

Mathematical Foundations

Statistical Power and Sample Size

The Law of Small Numbers

Interactive Sample Size Calculator

The Mathematics of Sample Size

Practical Sample Size Guidelines

Rule of Thumb for Sunblock Testing:

Advanced Concepts

An Abstract View: Finding Effects as Polynomial Roots

The Analogy Explained

Connecting Polynomials to Experimental Effects:

Real Roots = Detectable Effects

Complex Roots = Undetectable Effects

Near-Misses = Underpowered Experiments

Why This Analogy Matters

Interactive Polynomial Visualization

The Mathematics of Polynomial Roots

Connecting to Experimental Design

Thought Experiment: The Vanishing Effect

Imagine This Scenario: