Your test parameters
You need
How to use this calculator
Before you launch an A/B test, the single most important question is: do I have enough traffic to detect a meaningful difference? Running a test with too few visitors is a waste of time — you'll either never reach significance, or (worse) you'll declare a false winner and ship a change that doesn't actually work.
This calculator answers that question. You plug in your current conversion rate and the minimum improvement you care about detecting, and you get the number of visitors per variant you'll need before trusting the result.
Choosing your baseline conversion rate
Look at your analytics for the page, button, or funnel step you want to optimize. Use the conversion rate from the last 30-90 days as your baseline. Avoid seasonal spikes (Black Friday, launch weeks) — they're not representative of normal traffic.
Choosing your minimum detectable effect (MDE)
MDE is expressed as a relative change. If your baseline is 2% and you set MDE at 10%, you're saying "I want to detect a lift from 2% to 2.2% or bigger."
- 5-10% — large sites (100k+ visitors/month). Detects small, subtle wins.
- 10-20% — most businesses. Standard industry setting.
- 20-40% — low-traffic sites. You can only reliably detect big swings.
Smaller MDE needs much more traffic. Cutting MDE in half requires roughly 4x the visitors. If your math says you need 50,000 visitors per variant and you don't have that traffic in a reasonable timeframe, test bigger changes instead — aim for 20%+ improvements that don't need a microscope to see.
Significance level and power explained
Statistical significance (confidence) is the probability that your observed difference is not just random noise. 95% is the academic standard and what most tools default to. Testio's engine defaults to 90% because it pairs it with a Bayesian stopping rule that cross-validates results — you get equivalent rigor, faster.
Statistical power is the opposite question: if there really is a difference, how likely is the test to detect it? 80% is the industry default. Higher power means lower false-negative rate, but needs more traffic.
The formula under the hood
Where p₁ is the baseline rate, p₂ is the expected variant rate (baseline +
MDE), and p̄ is the pooled average. zα and zβ
come from the standard normal distribution based on your chosen significance and power.
This is the exact formula implemented in Testio's backend at
apps/api/src/lib/statistics.ts — no approximations, no rounding tricks.
Common mistakes to avoid
- Peeking at results early. Checking before reaching sample size and stopping when you see a "significant" result inflates your false positive rate dramatically.
- Running too many simultaneous tests on the same page. Variants interact, making results unreliable.
- Ignoring novelty and day-of-week effects. Run tests for at least a full business cycle (7+ days) even if you reach sample size earlier.
- Optimizing for the wrong metric. A higher click-through rate on a CTA means nothing if downstream revenue stays flat.
Frequently asked questions
Do I really need this much traffic? Other calculators give smaller numbers.
Most "smaller number" calculators sacrifice rigor for marketing appeal. They use generous assumptions (one-tailed tests, no correction for multiple variants, lower power). This calculator uses the same formula and defaults as Testio's real winner-detection engine, so the number matches what you'll actually need in production.
What if I have multiple variants (A/B/C or A/B/C/D)?
Each variant needs the full sample size. If you need 10,000 per variant and you're running 4 variants, you'll need 40,000 total visitors. More variants also slightly increase false-positive risk unless you apply a correction — Testio handles this automatically in production.
What's the difference between frequentist and Bayesian A/B testing?
Frequentist (the method this calculator uses) requires a pre-committed sample size. Bayesian methods allow "early stopping" once the posterior probability crosses a threshold. Testio uses Bayesian decisions in production (probability-to-be-best ≥ 95% with expected loss < 0.001), which often declares winners faster than this calculator predicts — but this calculator gives you the upper bound safe number.
Can I reduce the sample size I need?
Three ways: (1) test bigger changes that produce larger lifts, (2) optimize a page with a higher baseline conversion rate, (3) accept lower statistical power (risking more false negatives). Never just "wait until it looks significant" — that's not a valid reduction.
Ready to run your A/B test?
Visual editor, automatic winner detection, and real-time results. 3-day free trial, from $9/mo.
Start free trial →