How to use Bayesian Thinking

Q: Is Bayesian A/B testing more accurate than frequentist?

Neither is more accurate — they answer different questions. Frequentist tests bound the false-positive rate of a pre-defined procedure; Bayesian tests give you the probability that B beats A given the data and your prior. For most DTC decisions, the Bayesian question is the one you actually want answered.

Q: What posterior probability should I ship at?

A common rule for online retail is ship at P(B > A) ≥ 90% with expected loss under 0.5% of conversion rate. Lower the bar (85%) for low-risk copy tests; raise it (95%) for changes to checkout or pricing. The key is to set the threshold before the test starts.

Metricuno

May 18, 2026

7 min read

How to use Bayesian Thinking — How Bayesian thinking lets you read A/B tests as belief updates instead of pass/fail verdicts — practical guide for low-traffic DTC stores.

Quick answer

Bayesian thinking treats experiments as belief revision rather than binary verdicts, which is exactly what low-traffic Shopify and WooCommerce stores need to act on trending results without waiting months for frequentist significance.

Definition

Experimentation methodology

Bayesian Thinking

A way of reasoning that treats beliefs as probabilities and updates them as new evidence arrives.

Bayesian thinking is the discipline of starting with a prior belief about how likely something is, then revising that belief in proportion to the evidence you observe. In experimentation, it reframes an A/B test from a binary pass/fail verdict into a continuously updating probability that variant B beats A.

For stores running tests on modest traffic, this matters in practice. You rarely get the clean 95% significance moment frequentist methods demand. Bayesian thinking lets you say something useful at week two — "there's a 78% chance B is better, with an expected lift of 4%" — and make a defensible call instead of waiting another month for a verdict that may never arrive.

Also known as

Bayesian inference

Bayesian reasoning

belief updating

Most CRO teams were trained on frequentist statistics: set a hypothesis, collect data, check if p < 0.05. The result is binary — significant or not — and the test conclusion ignores everything you knew before you ran it.

Bayesian thinking does the opposite. It demands you write down what you already believe (the prior), then formally updates that belief with each conversion. The output is a probability distribution over possible lifts, not a yes/no answer.

How a Bayesian update actually works

The mechanism is Bayes' theorem: posterior belief is proportional to prior belief multiplied by the likelihood of the data you just observed. In plain English — start with what you thought was probable, multiply by how well the new evidence fits each possibility, renormalise.

For an A/B test on conversion rate, the prior is usually a Beta distribution describing your belief about each variant's true rate. As checkouts come in, each conversion or non-conversion shifts the distribution. After a few hundred sessions you have a posterior shape — not a single number — for both A and B.

From those two posteriors you compute the metric that actually matters: P(B > A). A test platform running Bayesian inference reports this directly. You can act on a 92% probability without needing the p-value ritual.

Why DTC stores care

An apparel brand running 35,000 sessions a month on a single PDP variant can rarely reach 95% frequentist significance inside a 4-week test window. Bayesian inference lets the same team ship at 90% posterior probability with a known expected loss — typically 6-10 days earlier per test, which compounds into 8-12 extra tests a year.

Reading a test that's trending but not significant

Imagine a beauty store testing a new product-page hero. After 12 days, B is up 5.2% but the p-value is 0.18. Under frequentist rules you keep waiting. Under Bayesian thinking you ask three questions: what does the posterior say, what's the expected loss if I ship the wrong one, and how does that compare to the cost of waiting?

The posterior might tell you P(B > A) = 81% with an expected loss of 0.3% conversion if you ship B and it turns out to be flat. Against a one-week delay that costs you the next test in the queue, shipping is often the right call — and Bayesian methods let you reason about it explicitly instead of pretending the decision doesn't exist.

Chart

Posterior probability that B beats A over a 21-day test

Notice the shape. The probability climbs smoothly as evidence accumulates — there's no magical threshold where the answer flips from "unknown" to "true". A 90% posterior on day 18 is not categorically different from 86% on day 15; it just reflects more data.

Choosing priors without fooling yourself

The honest objection to Bayesian methods is that priors can bias the result. If you start convinced B is better, weak data won't shift you much. The discipline is choosing priors that reflect what you genuinely knew before the test — not what you hope to find.

In practice, three prior strengths cover most situations. A weak (uninformative) prior says you have no idea — useful for genuinely novel changes. A moderate prior encodes typical e-commerce lift ranges. A strong prior is only justified when you have years of internal data on similar tests.

Benchmark

Prior strength by experiment context

Test context	Recommended prior	Effective prior sample size	Typical use
First test on a new template	Weak / uniform	~10 visitors	Novel layout, no historical data
Iterating on a known winner	Moderate informative	100-300 visitors	Refinement of a tested module
Repeated copy test on PDP	Moderate informative	200-500 visitors	You've run 5+ similar tests
Pricing or discount change	Weak (high stakes)	~25 visitors	Avoid baking in assumptions you'll regret
Checkout micro-copy	Strong informative	500-1000 visitors	Dozens of prior runs with stable effects

When in doubt, lean weaker than feels comfortable. A weak prior costs you a few extra days of test runtime; a strong, wrong prior costs you the right answer. This is the same logic frequentist statistical analysis uses when it refuses to look at prior data at all — Bayesian methods just make the trade-off explicit.

When Bayesian beats frequentist (and when it doesn't)

Bayesian inference wins decisively when traffic is constrained, when you need to peek at results without inflating false positives, and when stakeholders want a probability they can act on rather than a binary verdict. That covers most Shopify and WooCommerce stores under €15M revenue.

Frequentist methods still have a place. Regulatory contexts, pricing experiments that will be defended in a board meeting, and any test where you cannot defend your prior choice publicly — those are cases where the procedural rigour of a pre-registered frequentist test is worth the extra runtime. The two frameworks are tools, not tribes.

Peeking is fine — declaring is not

Bayesian methods let you LOOK at results any time without statistical penalty, because the posterior is always valid. They do not let you arbitrarily redefine the decision threshold mid-test. Decide your shipping rule (e.g. "ship at 90% with expected loss < 0.5%") BEFORE the test starts, and don't move it because the data is close.

Frequently asked

Frequently asked questions

Neither is more accurate — they answer different questions. Frequentist tests bound the false-positive rate of a pre-defined procedure; Bayesian tests give you the probability that B beats A given the data and your prior. For most DTC decisions, the Bayesian question is the one you actually want answered.

A common rule for online retail is ship at P(B > A) ≥ 90% with expected loss under 0.5% of conversion rate. Lower the bar (85%) for low-risk copy tests; raise it (95%) for changes to checkout or pricing. The key is to set the threshold before the test starts.

Eyeballing early lift ignores how much uncertainty remains. Bayesian methods give you the lift distribution AND the probability you're right — so a 6% lift with P(B > A) = 64% reads very differently from a 2% lift at 91%. It's the disciplined version of the intuition you already have.

For two-variant conversion tests, a Beta-Binomial calculation in a spreadsheet works fine for one-off analyses. Continuous tools like Metricuno, VWO Bayesian mode, or Convert make sense when you're running multiple tests a month and need automated stop rules and expected-loss reporting.

A wrong prior gets corrected by data — that's the whole mechanism. The risk is using a strong prior on thin evidence: 20 conversions can't overpower a confidently wrong prior worth 500 visitors. Default to weak priors when you're unsure; the cost is a few extra days of runtime.

Yes, but expect wide posterior intervals. With 500 visitors per variant and a 3% baseline, you'll typically need a real lift of 15%+ to reach 90% posterior probability. Below that, you'll either need more traffic or accept higher decision risk.

Cleanly. You compute P(variant i is best) for each arm, which sums to 100% across all variants. This avoids the multiple-comparison corrections that bite frequentist tests with 3+ arms, and pairs naturally with multi-armed bandit allocation if you want to shift traffic toward winners during the test.

Yes, with a caveat. The posterior probability is mathematically valid at any sample size, so checking daily doesn't inflate error rates the way frequentist peeking does. But you must decide your stopping rule in advance — peeking and then redefining "good enough" is just p-hacking with extra steps.

It's the formal mathematical core of judgment under uncertainty. Most cognitive biases — base-rate neglect, anchoring, availability — are failures to update beliefs correctly. Bayesian methods give you a structured way to do explicitly what your intuition does poorly under pressure.

Pick your next test and report two numbers alongside the usual lift: P(B > A) and expected loss. You don't have to switch tools immediately — most platforms now show these. Practising the interpretation will change how your team reads results within 3-4 tests.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

How to use Bayesian Thinking

Bayesian Thinking

How a Bayesian update actually works

Reading a test that's trending but not significant

Posterior probability that B beats A over a 21-day test

Choosing priors without fooling yourself

Prior strength by experiment context

When Bayesian beats frequentist (and when it doesn't)

Frequently asked questions

Is Bayesian A/B testing more accurate than frequentist?

What posterior probability should I ship at?

How is this different from just looking at the lift early?

Do I need a Bayesian test tool, or can I use spreadsheets?

What if my prior is wrong?

Can I run a Bayesian test with fewer than 1,000 visitors per variant?

How does Bayesian thinking handle multiple variants?

Is peeking really allowed?

How does Bayesian thinking fit into broader judgment under uncertainty?

What's the simplest way to start using Bayesian methods this quarter?

Test ideas before you ship them