A/B Test Sample Size Calculator

Find out how many visitors you need per variant to detect a statistically significant result. Free, no signup, no tracking cookies — just math.

Your test parameters

2.0%
Your current conversion rate (control)
10%
Smallest improvement you want to detect (e.g. +10% means going from 2% to 2.2%)
Total visitors across all variants per day, used to estimate test duration
Advanced settings
Testio's engine uses 90% (Bayesian P≥95% with early-stopping rule)
Probability of detecting an effect when one truly exists

You need

9,434
visitors per variant
Total visitors
18,868
Test duration
~19 days
✓ Accurate — this calculator uses the exact same two-proportion z-test formula as Testio's internal engine, so results match what you'll see in production.
⚠ Low traffic warning — at your current daily volume, this test will take over 30 days. Consider a larger MDE or more aggressive changes.
Run this test in Testio →

How to use this calculator

Before you launch an A/B test, the single most important question is: do I have enough traffic to detect a meaningful difference? Running a test with too few visitors is a waste of time — you'll either never reach significance, or (worse) you'll declare a false winner and ship a change that doesn't actually work.

This calculator answers that question. You plug in your current conversion rate and the minimum improvement you care about detecting, and you get the number of visitors per variant you'll need before trusting the result.

Choosing your baseline conversion rate

Look at your analytics for the page, button, or funnel step you want to optimize. Use the conversion rate from the last 30-90 days as your baseline. Avoid seasonal spikes (Black Friday, launch weeks) — they're not representative of normal traffic.

Choosing your minimum detectable effect (MDE)

MDE is expressed as a relative change. If your baseline is 2% and you set MDE at 10%, you're saying "I want to detect a lift from 2% to 2.2% or bigger."

Smaller MDE needs much more traffic. Cutting MDE in half requires roughly 4x the visitors. If your math says you need 50,000 visitors per variant and you don't have that traffic in a reasonable timeframe, test bigger changes instead — aim for 20%+ improvements that don't need a microscope to see.

Significance level and power explained

Statistical significance (confidence) is the probability that your observed difference is not just random noise. 95% is the academic standard and what most tools default to. Testio's engine defaults to 90% because it pairs it with a Bayesian stopping rule that cross-validates results — you get equivalent rigor, faster.

Statistical power is the opposite question: if there really is a difference, how likely is the test to detect it? 80% is the industry default. Higher power means lower false-negative rate, but needs more traffic.

The formula under the hood

n = 2 × p̄(1-p̄) × (zα + zβ)² / (p₂ - p₁)²

Where p₁ is the baseline rate, p₂ is the expected variant rate (baseline + MDE), and is the pooled average. zα and zβ come from the standard normal distribution based on your chosen significance and power.

This is the exact formula implemented in Testio's backend at apps/api/src/lib/statistics.ts — no approximations, no rounding tricks.

Common mistakes to avoid

Frequently asked questions

Do I really need this much traffic? Other calculators give smaller numbers.

Most "smaller number" calculators sacrifice rigor for marketing appeal. They use generous assumptions (one-tailed tests, no correction for multiple variants, lower power). This calculator uses the same formula and defaults as Testio's real winner-detection engine, so the number matches what you'll actually need in production.

What if I have multiple variants (A/B/C or A/B/C/D)?

Each variant needs the full sample size. If you need 10,000 per variant and you're running 4 variants, you'll need 40,000 total visitors. More variants also slightly increase false-positive risk unless you apply a correction — Testio handles this automatically in production.

What's the difference between frequentist and Bayesian A/B testing?

Frequentist (the method this calculator uses) requires a pre-committed sample size. Bayesian methods allow "early stopping" once the posterior probability crosses a threshold. Testio uses Bayesian decisions in production (probability-to-be-best ≥ 95% with expected loss < 0.001), which often declares winners faster than this calculator predicts — but this calculator gives you the upper bound safe number.

Can I reduce the sample size I need?

Three ways: (1) test bigger changes that produce larger lifts, (2) optimize a page with a higher baseline conversion rate, (3) accept lower statistical power (risking more false negatives). Never just "wait until it looks significant" — that's not a valid reduction.

Ready to run your A/B test?

Visual editor, automatic winner detection, and real-time results. 3-day free trial, from $9/mo.

Start free trial →
Link copied to clipboard