Impact Estimation

Metricuno
May 17, 2026
3 min read
Impact Estimation — Learn how to estimate the revenue lift of an A/B test before you run it — formula, worked example, and realistic ranges by test type.
Quick answer

Impact estimation forecasts the revenue lift you'd capture if a test wins — the number that decides whether a hypothesis is worth queuing at all.

Definition
Experimentation

Impact Estimation

Forecasting the expected revenue lift from a winning A/B test, used to decide which experiments are worth running.

Impact estimation is the act of predicting how much extra revenue a test would generate if its winning variant rolled out to 100% of traffic. The standard back-of-envelope is effect size × audience size × average order value × time horizon, sometimes adjusted for the probability the test actually wins.

It's the quantitative half of experiment prioritization. Without it, a backlog is just a list of opinions; with it, each hypothesis carries a euro figure you can sort, defend, and compare against the cost of building the variant.

Also known as
Test impact forecast
Projected lift
Revenue impact modeling

Most CRO backlogs die because every test feels equally important. Impact estimation breaks the tie by forcing each hypothesis to declare a number — the projected annualised revenue if it wins — before it earns a slot on the roadmap.

The number doesn't need to be precise. It needs to be defensible: a 15% checkout lift on the 8% of sessions that reach checkout is a fundamentally different bet than a 15% lift on a homepage banner that 100% of visitors scroll past. Impact estimation makes that asymmetry visible.

Formula

Estimated Annual Lift (€) = Effect Size × Audience Share × Annual Sessions × Conversion Rate × AOV

Variables

Effect Size

Expected relative lift

Conservative estimate of how much the variant improves the target metric, e.g. 5% conversion uplift.

Audience Share

% of sessions exposed

Fraction of total sessions that actually see the change — checkout tests only touch sessions that reach checkout.

Annual Sessions

Yearly site traffic

Total sessions the site receives in a 12-month window.

Conversion Rate

Baseline conversion

Current conversion rate of the affected step or funnel.

AOV

Average order value

Mean revenue per completed order.

Worked example

A Shopify apparel store testing a new product-page size guide. The team expects a 4% relative lift in add-to-cart-to-purchase rate.

Effect Size: 4%

Audience Share (sessions reaching PDP): 55%

Annual Sessions: 2,400,000

Baseline Conversion Rate: 2.1%

AOV: €78

≈ €86,500 estimated annual lift

A mid-six-figure lift estimate is more than enough to justify a two-week test cycle. If the same hypothesis only touched 8% of sessions (e.g. a returns-page tweak), the same effect size would forecast under €13k — likely below the team's cutoff.

Pick effect sizes from prior tests on similar surfaces, not from gut feel. Teams that anchor on "this could be a 20% lift" consistently overestimate impact by 3-5×, which corrupts prioritization and makes the backlog itself untrustworthy.

Benchmark

Realistic effect-size ranges by test type (relative lift on the affected funnel step)

Test typeConservativeTypicalOptimistic
Checkout friction removal2%5-8%12%
Product page redesign1%3-5%9%
Add-to-cart button / CTA copy0.5%1-3%5%
Homepage hero / value prop0.5%1-2%4%
Pricing or discount display2%4-7%11%
Cross-sell / upsell module1%2-4%7%
Navigation / category structure0.5%1-3%6%

Use the conservative column when you build the prioritization score and the optimistic column only as an upper-bound sanity check. The gap between them is the risk you're carrying — wider gap, more dependent your roadmap is on this one test landing well.

Frequently asked

Frequently asked questions

Test slots are scarce. Every hypothesis you run costs 2-4 weeks of traffic on a finite set of high-traffic pages. Impact estimation tells you which of the 30 ideas in your backlog deserves that slot before you commit, instead of after.

Experiment prioritization is the full ranking framework — it weighs impact, confidence, and effort together. Impact estimation is the one quantitative input inside that framework that answers "if this works, how much money?"

Use the median lift from your last 10-20 tests on the same surface. If you don't have that history, start with the "typical" column above and discount by 30-50% — most teams overestimate. Once you've shipped 15+ tests, switch to your own data.

Yes, for the prioritization score. A 70% confidence × €100k impact bet ranks above a 20% × €200k bet. Some teams call this risk-adjusted impact or expected value, and it's what separates ICE from PIE-style frameworks.

Pull the funnel report for the affected step. If 8% of sessions reach the shipping screen, your audience share is 8% — not your total session count. Forgetting this is the single most common impact estimation error.

Annualise it. Comparing tests on a 14-day basis penalises slower-traffic pages unfairly. The decision you're making is "do we ship this permanently?", so the time horizon should match — 12 months is the standard.

Anchoring on the optimistic case. A 15% lift is a great headline, but if your last five tests averaged 2.8%, your estimate should too. Inflated forecasts erode trust with finance and lead to the wrong tests getting prioritised.

If the test plausibly changes AOV — cross-sell, bundle, free-shipping threshold — yes, model it separately as Sessions × Conversion × ΔAOV. For pure conversion-rate tests, hold AOV constant and only flex the conversion side.

Precise enough to sort the backlog correctly. Order of magnitude is what matters: €5k vs €50k vs €500k. Spending two hours debating €78k vs €84k is wasted effort — the underlying effect-size assumption has more uncertainty than that anyway.

Yes. Swap AOV for the value of the target action — a newsletter signup might be worth €4 in expected downstream revenue, a returning visit €0.80. As long as every test uses the same valuation logic, the rank order stays honest.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.