Experiment Velocity

Experiment velocity is the number of tests your team ships per month. It compounds with win rate and average lift to determine almost all of your CRO program's annual return.
Experiment Velocity
The number of controlled experiments (A/B, multivariate, or split-URL tests) an organization ships per period, usually counted per month.
Experiment velocity measures throughput, not effort: a test only counts once it has reached its pre-registered sample size and produced a decision (win, loss, or inconclusive). Pages launched, hypotheses written, and tests still warming up don't count.
It matters because CRO returns compound across three multipliers — velocity, win rate, and average lift per winner. Doubling velocity from 2 to 4 monthly tests typically does more for annual revenue than chasing a higher win rate, because more shots on goal produce more winners and more learning per quarter. It's the single metric most predictive of program ROI.
Most online stores in the €1M–€15M revenue band run between 1 and 3 tests per month. The teams pulling away from that pack — usually 6 to 10 tests per month — aren't smarter; they've removed the bottlenecks that turn a hypothesis into a shipped variant.
Those bottlenecks are almost always the same four: dev queue for variant code, ambiguous traffic allocation between concurrent tests, slow time-to-significance on low-traffic pages, and a hypothesis backlog written from gut feel rather than real funnel drop-off. Fix those and velocity roughly doubles within a quarter.
Annual Revenue Lift ≈ Baseline Revenue × (1 + Velocity × Win Rate × Avg Lift)^12 − Baseline Revenue
Velocity
Experiment Velocity
Tests concluded per month
Win Rate
Win Rate
Share of concluded tests that produce a statistically significant winner
Avg Lift
Average Lift per Winner
Mean conversion-rate improvement of winning variants, expressed as a decimal
Baseline Revenue
Monthly Baseline Revenue
Revenue from the surface being tested before any wins are deployed
A Shopify apparel store doing €500,000/month in product-page revenue ships 4 tests per month, with a 25% win rate and 5% average lift per winner.
Velocity: 4 tests/month
Win Rate: 25%
Avg Lift: 5%
Baseline monthly revenue: €500,000
→ Roughly €395,000 in incremental annual revenue from the compounded ~6.6% effective monthly improvement rate.
Halving velocity to 2 tests/month with the same win rate and lift cuts the incremental return to ~€190,000 — the velocity multiplier matters more than the lift multiplier at typical DTC ranges.
Use the benchmarks below to locate your program. The figures reflect tests that actually concluded with a decision — not launched, not paused, not still gathering data. Counting launches inflates velocity and disguises a backlog of inconclusive tests.
Monthly experiment velocity benchmarks for online retail by revenue tier
| Annual revenue | Lagging | Median | Top quartile |
|---|---|---|---|
| €1M – €3M | 0–1 tests/mo | 1–2 tests/mo | 3–4 tests/mo |
| €3M – €7M | 1–2 tests/mo | 2–4 tests/mo | 5–7 tests/mo |
| €7M – €15M | 2–3 tests/mo | 4–6 tests/mo | 8–12 tests/mo |
| €15M+ | 3–5 tests/mo | 6–10 tests/mo | 15+ tests/mo |
If you're below the median for your tier, the fix is rarely "more hypotheses." It's removing the dev dependency on variant code, prioritising pages with enough traffic to reach significance in under two weeks, and grounding the backlog in observed funnel drop-off rather than opinion. Velocity is an operational metric — treat it like one and review it weekly alongside your broader experimentation strategy.
Frequently asked questions
Count the number of tests that reached their predetermined sample size and produced a decision (winner, loser, or inconclusive) within the period — usually a month. Tests still running or stopped early don't count. Most teams average the trailing three months to smooth out launch lulls.
Median for that revenue tier is 2–4 concluded tests per month; top-quartile programs ship 5–7. Below 2/month and you're leaving compounding returns on the table; above 7/month you'll need real traffic discipline to avoid concurrent-test interference.
At typical online-retail win rates (15–30%), yes. Doubling velocity from 2 to 4 monthly tests adds more annual revenue than pushing win rate from 25% to 35%, because more shots on goal also generate more learnings that improve future hypotheses.
The leverage points are tooling and prioritisation, not headcount. Use a visual editor for variant code (eliminates dev queue), focus on high-traffic surfaces that conclude in under two weeks, and kill your bottom-quartile hypothesis backlog so the team only builds tests with a credible mechanism.
Yes. Running concurrent tests on overlapping surfaces creates interaction effects that bias results. The practical ceiling on a single conversion funnel is around 3–4 simultaneous tests, with non-overlapping audiences or pages. Above that, segment by traffic source or route.
Yes, but you'll measure it differently. Stores under ~50,000 monthly sessions usually can't conclude more than 1–2 tests per month at standard 95% confidence. Either test bigger changes (10%+ expected lift), test higher in the funnel where traffic is larger, or accept slower cadence.
Velocity is the operational output; experimentation strategy decides what to test and why. A high-velocity program with a weak strategy ships lots of inconclusive tests on low-impact surfaces. A great strategy with low velocity is a slideshow. You need both, measured separately.
A launched test is live; a concluded test has a decision recorded against it. Only the latter counts toward velocity. Teams that report on launches tend to accumulate a long tail of underpowered, never-decided tests that look like activity but produce no learnings or wins.
Winners typically deploy 1–2 weeks after a test concludes, then compound monthly. Expect a visible revenue signal 60–90 days after raising velocity, and a clear year-over-year delta after 6 months of sustained higher cadence.
Yes — alongside win rate and average lift, never alone. Velocity reported in isolation incentivises shipping low-quality tests to hit a number. The honest scorecard is velocity × win rate × avg lift, which maps directly to revenue impact and is harder to game.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.