Base Rate Neglect

Metricuno
May 18, 2026
4 min read
Base Rate Neglect — Base rate neglect is why teams over-react to a single winning variant. Learn how to weight priors so your experiment readouts stay calibrated.
Quick answer

Base rate neglect is the tendency to ignore prior probabilities when judging new evidence — the cognitive trap behind most over-confident A/B test readouts.

Definition
Behavioral Economics

Base Rate Neglect

Base rate neglect is the tendency to ignore prior probabilities when judging new evidence, leading to over-confident conclusions from weak signals.

Base rate neglect (also called the base rate fallacy) is the cognitive bias of weighting fresh, vivid evidence too heavily and the underlying prior probability too lightly. In experimentation, it shows up when an operator sees one variant beat control by 8% in week one and concludes the variant 'won' — without anchoring that result in the prior that most A/B tests produce no real lift.

The correct read of any test result is Bayesian: combine what you knew before the test with what the test told you. When the prior win rate for tests in your program is, say, 15%, a marginal p-value should barely move your belief. Base rate neglect is what makes teams ship false positives, abandon real winners, and tell themselves stories the data doesn't support.

Also known as
base rate fallacy
prior neglect

The classic illustration: a medical test is 95% accurate for a disease that affects 1 in 1,000 people. A positive result feels alarming, but the math says the patient is still far more likely to be healthy than sick — because the base rate is so low. Most people get this wrong, including trained clinicians.

The CRO analogue is direct. The 'disease' is a real lift, the 'test' is your A/B platform's significance calculation, and the base rate is your program's historical win rate — usually 10-25% for mature programs. A 95% significant winner from a single underpowered test is far more often noise than truth, exactly the same way a positive result on a rare-disease screen is far more often a false alarm than a diagnosis.

Formula

P(real_win | significant) = (P(significant | real_win) × P(real_win)) / P(significant)

Variables

P(real_win)

Prior win rate

Historical share of your tests that produced a real, replicable lift. Typically 10-25%.

P(significant | real_win)

Statistical power

Probability the test reaches significance when a real win exists. Usually 0.8 if powered correctly.

P(significant)

Overall significance rate

Probability of seeing a significant result either from a true win or a false positive (α ≈ 0.05).

Worked example

An apparel store sees a 'winner' at p=0.05 after running a homepage hero swap. Their program's prior win rate is 15%, the test was 80% powered.

Prior P(real_win): 0.15

Power P(significant | real_win): 0.80

False positive rate α: 0.05

P(real_win | significant) ≈ 0.738

Even with a 'significant' result, there's still a ~26% chance the apparent winner is noise. If your prior win rate is closer to 10% and the test was underpowered, that false-positive probability climbs above 40%.

The lever you control is the prior. Programs that track historical win rates honestly — and feed them back into how new results are weighted — ship far fewer false positives. Programs that don't track priors at all are, by definition, neglecting the base rate on every readout.

Benchmark

How often an 'apparent winner' is actually a real lift, by prior win rate and test power

Prior win ratePower 50%Power 80%Power 95%
5% (early program)35%46%50%
15% (typical Shopify store)64%74%77%
25% (mature CRO program)77%84%86%
40% (strong hypothesis backlog)87%91%93%

Read the table left-to-right and the lesson is clear: at a 5% prior, even a well-powered significant result is barely a coin flip. The fix isn't to lower your significance threshold — it's to raise your prior by writing better hypotheses, and to stop calling tests early before they hit power. Base rate neglect is one of the most consequential cognitive biases in experimentation, and it compounds with confirmation bias when teams want a test to win.

Frequently asked

Frequently asked questions

They refer to the same phenomenon. 'Base rate fallacy' is the older term from Kahneman and Tversky's judgment under uncertainty work; 'base rate neglect' is the more common label in modern behavioral economics. Use either.

It causes operators to treat p<0.05 as proof of a real win, ignoring that the prior probability of any given test producing a real lift is often only 10-20%. The result: shipping false positives, declaring victory too early, and a portfolio of 'winners' that don't replicate.

Pull your last 20-50 completed tests. Count how many produced a statistically significant lift that was confirmed in a holdout or a follow-up replication test. That ratio is your honest prior. Most stores land between 10% and 25%; if yours looks much higher, you're probably not validating winners.

No, but they're cousins. Regression to the mean is a statistical phenomenon — extreme results tend to be followed by less extreme ones. Base rate neglect is the cognitive failure to anticipate that, because you didn't anchor on the prior in the first place.

It helps because it forces you to declare a prior explicitly. But the trap re-appears if you pick an uninformative prior by default — you've just hidden the assumption rather than fixed it. The discipline of estimating and updating your prior matters more than the framework label.

Base rate neglect compounds with confirmation bias (you want the variant to win), the narrative fallacy (a clean story makes a noisy result feel real), and survivorship bias (you remember the tests that 'worked'). Together they explain most of the gap between reported and replicated lift.

5-10% in the first six months is typical. Teams without a hypothesis backlog or qualitative research often test ideas that have no mechanism, and those rarely win. As the program matures and hypotheses get sharper, priors climb toward 20-25%.

Three habits: (1) state the prior win rate at the top of every readout, (2) require tests to hit pre-declared power before calling them, and (3) follow up significant wins with a replication or holdout before rolling them into roadmap learnings.

Larger samples shrink the false positive rate per test, but they don't fix the cognitive bias. Even a perfectly powered test can mislead if you ignore the prior. Sample size is necessary; calibrated priors are also necessary.

It was formalised by Daniel Kahneman and Amos Tversky as part of their broader work on judgment under uncertainty. It's now treated as one of the foundational cognitive biases, alongside availability, representativeness, and anchoring.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.