// FOUNDER A/B TEST REALITY CHECK

Do you have enough traffic to run this experiment?

Most founder A/B tests never reach significance. Plug in your numbers — find out whether this test will finish in days, weeks, or never.

Free A/B Test Sample Size Calculator

Enter your baseline conversion rate, the smallest lift worth caring about, and your weekly traffic. The calculator returns the sample size per variant, days to significance, and a plain-English verdict on whether this test will actually finish.

Your numbers
%

What percentage of visitors converts on the page today.

%

The smallest improvement you'd actually act on. +20% means going from 5% to 6%.

Total monthly traffic to the page you're testing. Split evenly between the two versions.

Feasibility map

Where your test lands on the cost curve. The dot marks the cell closest to your inputs; click any cell to inspect a different scenario.

BASELINE CVR ↓RELATIVE LIFT (MDE) →+5%+10%+20%+30%+50%1%2%5%10%20%636k10.5yr163k2.7yr43k8.5mo20k4.0mo7.7k1.5mo315k5.2yr81k1.3yr21k4.2mo9.8k2.0mo3.8k23d122k2.0yr31k6.2mo8.1k1.6moClosest cell to your inputs3.8k23d1.5k9d58k11.5mo15k2.9mo3.8k23d1.8k11d6834d26k5.1mo6.5k1.3mo1.7k10d7685d2912d
Selected cell
Baseline 5% detect +20% relative lift
Samples / variant
8,146
At 10k/mo
1.6 months
< 14d
< 60d
60-180d
> 180d
How the math works

Four numbers decide every A/B test.

  • Baseline conversion rate (p₁). What the page does today. Lower baselines need exponentially more samples to detect the same relative lift.
  • Minimum detectable effect (MDE). The smallest relative improvement you want to be able to call. Smaller MDEs need dramatically more samples.
  • Significance level (α = 0.05). Tolerance for false positives — the industry-standard 95% confidence is locked in here.
  • Statistical power (1−β = 0.80). The probability you actually catch a real effect. 80% is conventional and locked in here.
n = (Zα/2 + Zβ)² × (p₁(1−p₁) + p₂(1−p₂)) / (p₂ − p₁)²

At 95% confidence and 80% power, (Zα/2 + Zβ)² ≈ 7.84. Days to significance assumes you split traffic evenly between control and one variant.

What this calculator does not model: Bayesian priors, sequential tests, more than two variants, or unequal traffic splits. For all of those, the math shifts — start with the field guide before you commit to a non-standard design.

Run this experiment in Xi.

A calculator tells you whether the math works. Xi runs the experiment: lock the hypothesis and kill threshold up front, track the metric automatically, and let agents call the verdict so you don’t drift into endless “just one more week.”