Building A Sensitivity Range Around A Forecasted RPV Lift

Metricuno
June 12, 2026
7 min read
Building A Sensitivity Range Around A Forecasted RPV Lift — Turn a single-point RPV lift forecast into a defensible best/base/worst range using test CI bounds plus traffic and AOV variance. The version CFOs actually trust.
Quick answer

How to wrap an annualized RPV lift projection in a defensible best/base/worst band using confidence-interval bounds, traffic seasonality, and AOV variance — so finance stops treating your point estimate as a promise.

Quick answer

Build the band from three independent sources of variance: the test's confidence-interval bounds on RPV lift, expected traffic variance over the forecast horizon, and AOV variance. Multiply the low-low and high-high combinations to get worst/base/best annualized revenue. Use 80% CI for the band a CFO will plan against; 95% if you only need to show downside risk.

Definition
Forecasting & Statistical Analysis

RPV Lift Forecast Sensitivity Range

A best/base/worst band wrapped around an annualized RPV lift forecast, built from test CI bounds plus traffic and AOV variance.

A sensitivity range converts a single-point revenue projection — say, "this checkout test will add €420k next year" — into a defensible interval like €210k / €420k / €640k. The base case uses the point estimate of RPV lift. The worst and best cases combine the lower and upper bounds of the test's confidence interval on RPV with realistic ranges for forward traffic and AOV. The output is the version finance actually plans against, because it makes the underlying uncertainty visible instead of hiding it inside a single number.

Also known as
best/base/worst RPV forecast
RPV lift forecast band
scenario-based revenue projection

A point estimate from an A/B test is a number with error bars you chose not to draw. Hand a CFO €420k of forecasted lift and they'll either treat it as a commitment or discount it to zero. Neither is what you want.

Why a single-point RPV forecast fails the CFO test

Your test produced an observed RPV lift with a 95% confidence interval — say +6.2% with bounds at +2.1% and +10.3%. A single-point annualized forecast collapses that interval into one number, which triggers overconfidence bias in stakeholders downstream. Finance teams that anchor on the point estimate then treat any underperformance as a miss.

The other failure mode is the inverse: a sophisticated CFO sees a single number and silently halves it, because they've been burned before. Either way, you lose the room. A three-scenario band — worst, base, best — forces an explicit conversation about which assumptions move the answer the most.

The 30% holdout gap

Test-period RPV lifts typically shrink ~30% once you validate against a holdout cohort. If your base case uses the raw test lift without that haircut, your "worst case" is probably still optimistic. Apply the holdout correction to the base before building the band — not after.

The three variance sources that build the band

A defensible range layers three independent sources of uncertainty. First, statistical variance from the test itself — the CI bounds on RPV lift. Second, traffic variance over the forecast horizon, which depends on seasonality, paid-media plans, and the channel mix. Third, AOV variance, which moves with product mix, discount cadence, and currency for stores selling across Shopify Markets.

These three sources are mostly independent, which means you don't add them linearly — you combine the bounds. Worst case takes the low end of all three (lower CI bound on lift, low-traffic scenario, low AOV). Best case takes the high end of all three. Base is the central point estimate of each, with the holdout correction applied.

Benchmark

Typical variance ranges to plug into an RPV forecast band (apparel & beauty, €1M-€15M Shopify stores)

InputWorstBaseBest
RPV lift (95% CI on +6.2% observed)+2.1%+6.2%+10.3%
Holdout correction applied to base−30%
Annual sessions vs plan−15%0%+10%
AOV vs trailing-12 average−8%0%+5%
Resulting annualized revenue impact€155k€295k€640k

How to actually compute the band

The mechanical version is simple. For each scenario, compute: annual sessions × baseline RPV × (1 + RPV lift) × AOV adjustment factor. Then subtract the baseline (sessions × baseline RPV) to isolate the incremental impact attributable to the test. Do it three times — worst, base, best — with the inputs from the table above.

You have two ways to build the lift bounds. Analytical CI bounds come straight from the test's standard error and a z-score — fast and defensible for normal-ish RPV distributions. Monte Carlo simulation samples the distribution and the input variances thousands of times — better when RPV is heavy-tailed or when you want to combine non-independent variance sources. For most stores the analytical approach is enough.

80% or 95% — which CI width?

Use 95% CI when the question is "what's the realistic downside?" — regulators, board presentations, go/no-go decisions on big capex. Use 80% CI when the question is "what should we plan against?" — annual budgeting, marketing-spend allocation. The 80% band is tighter and more useful for operational planning; 95% is wider and more useful for risk framing.

Presenting the band so the CFO uses it

Lead with the base case as a single number, then immediately show the band underneath. Label the scenarios in business language — "if traffic and AOV come in flat" for base, "if Q4 paid efficiency drops 15%" for worst — not statistical language. The CFO needs to know which assumption to argue with, not which percentile you chose.

If you're presenting to a finance team that plans by quarter, break the annual band into quarterly bands using your traffic seasonality curve. A flat annual forecast that ignores the Q4 spike will be wrong every single quarter, even if it's right in total. Worked examples on a €5M Shopify store typically show Q4 carrying 35-40% of the annual lift, which is the number that actually drives hiring and inventory decisions.

Frequently asked

Frequently asked questions

Use 80% CI for the band finance plans against and 95% CI when you need to communicate downside risk to a board or audit committee. The 80% band is narrower and more decision-useful; 95% is wider and more defensive. Show both if there's room — they answer different questions.

Take the lower and upper CI bounds on the RPV lift percentage and apply each to your annualized session × baseline-RPV calculation. Combine each lift bound with a corresponding traffic and AOV scenario — worst-case lift with low traffic and low AOV; best-case lift with high traffic and high AOV — to get the outer scenarios of the band.

Analytical CI bounds are fine for most retail tests where RPV is roughly normal and the variance sources are independent. Use Monte Carlo when RPV is heavy-tailed (high-ticket SKUs, long tail of basket sizes) or when you need to model correlation between traffic and AOV — for example, paid traffic typically converts at lower AOV than organic.

Before. Test-period RPV lifts shrink roughly 30% once validated against a holdout, so apply that correction to your base case first, then build the band around the corrected base. Otherwise your worst case is anchored on an inflated number and the whole range slides high.

For most Shopify stores in the €1M-€15M band, ±10-15% annual traffic variance is realistic — wider if you're heavily dependent on paid social or one or two organic keywords. Pull the last three years of GA4 sessions and use the actual coefficient of variation rather than guessing.

AOV is usually more stable than traffic — ±5-8% is typical for apparel and beauty stores with a steady product mix. It widens during promotional periods, currency swings, or product launches. If you're forecasting across a discount-heavy Q4, model AOV separately for peak vs non-peak weeks.

It triggers overconfidence bias — finance anchors on the number and treats it as a commitment, then any underperformance reads as a miss. Worse, sophisticated CFOs silently discount single-point forecasts because they've been burned before. A three-scenario band makes the uncertainty explicit and shifts the conversation to which assumption matters most.

Lead with the base case as one number, then show worst and best underneath with business-language labels ("if Q4 paid efficiency drops 15%") rather than statistical labels ("5th percentile"). The goal is to make the assumptions debatable, not to teach confidence intervals.

Alongside. Most finance teams still want a single number to plug into the model — that's your base case. The band sits next to it as the planning range. Without the point estimate, budgeting stalls; without the band, the point estimate gets treated as a promise.

Apply your traffic seasonality curve to each scenario. For most retail brands, Q4 carries 30-40% of annual sessions, so the band widens in absolute euros during Q4 even if the percentage variance is the same. Quarterly bands are what drive inventory and hiring decisions — annual bands rarely do.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.