How to use Cohort Analysis

Metricuno
May 17, 2026
6 min read
How to use Cohort Analysis — Learn how to use cohort analysis to measure the long-term impact of A/B tests on retention, repeat purchase, and LTV — not just session-level lift.
Quick answer

Cohort analysis groups users by when they entered an experiment so you can measure whether a variant changes long-term behavior, not just the first session. Here's how to design, read, and act on it.

Definition
Experiment Analysis

Cohort Analysis

Grouping users by a shared starting event — like the date they entered an A/B test — to track how their behavior diverges over time.

Cohort analysis examines test impact on groups of users defined by a temporal marker: the day they were bucketed into a variant, their first session, or their first purchase. Instead of reporting a single conversion rate at the end of the test window, you watch each cohort's retention, repeat-purchase rate, and revenue per visitor unfold week by week.

For experimentation, that distinction matters. A variant that lifts checkout conversion 8% in week one can quietly suppress week-four repeat purchases — and a session-level readout will never show it. Cohort analysis is the lens that catches the difference between a real win and a borrowed one.

Also known as
Cohort study
Retention cohort analysis
Temporal cohort analysis

Most A/B test reports stop at the end of the test window. You ship the winner, the dashboard turns green, and the team moves on. The problem: roughly a third of e-commerce variants that win on session conversion lose on 60-day revenue per visitor once you account for return behavior.

Cohort analysis is the safety net. It sits inside the broader practice of experiment analysis and asks a different question: not did this variant convert more visitors today, but did the visitors it converted come back, buy again, and stay valuable?

Why session-level results aren't enough

Session conversion rewards anything that pushes a visitor over the line in the current visit. Aggressive urgency banners, automatic discount stacking, prefilled high-quantity carts — all of these can lift checkout conversion meaningfully in a two-week test.

What they often do underneath is pull demand forward. The customer who would have bought next month buys today, at a discount, and then doesn't come back for 90 days instead of 30. Your test window closes before the gap shows up.

A cohort view exposes this. You assign every visitor a cohort tag the moment they hit the experiment, then track that cohort's cumulative orders and revenue for the next 8-12 weeks. If the variant cohort's curve flattens earlier than the control's, you pulled demand forward — and the lift you booked isn't real.

The cannibalisation trap

A variant that wins +6% on session conversion but causes a 9% drop in 60-day repeat rate is a net loser. Shipping it without cohort analysis is how teams hit their quarterly CR target and miss the annual revenue plan.

How to build cohorts from a running experiment

The cohort key for an experiment is the assignment timestamp: the moment a visitor was bucketed into control or variant. Every event that user produces afterwards — sessions, add-to-carts, orders, refunds — gets stamped with that cohort and with the variant they were assigned to.

On Shopify, the practical way to do this is to pass the variant ID into your analytics layer as a custom user property the first time it's resolved, then keep it stable across sessions via the customer ID once they log in or check out. For anonymous traffic, a first-party cookie does the job for the 8-12 weeks you need.

Chart

Cumulative revenue per visitor — control vs variant, 12 weeks post-assignment

0€5€10€15€20€124681012Cumulative revenue per visitor (€)Weeks since assignment

Control

Variant (aggressive discount)

The shape above is the classic pull-forward signature: variant leads through week four, gets caught between weeks five and six, and finishes 21% behind control on cumulative revenue per visitor. The session-level test report, closed at week two, declared this a clear winner.

Metrics worth tracking by cohort

Pick metrics that compound over time. Session conversion is already in your primary test readout — adding it to the cohort view is redundant. The cohort view earns its keep by surfacing the metrics that only make sense across weeks.

For most online stores, four cohort metrics cover 90% of the decisions: repeat-purchase rate at 60 days, average orders per customer at 90 days, cumulative revenue per visitor at 12 weeks, and refund rate at 30 days. The fourth one catches variants that win conversions by attracting the wrong buyer.

Benchmark

Typical cohort-metric ranges by vertical (60 days post-assignment)

VerticalRepeat-purchase rateOrders per customerRefund rate
Apparel22-32%1.4-1.78-14%
Beauty & skincare35-48%1.8-2.33-6%
Consumer electronics8-14%1.1-1.210-18%
Home & kitchen15-22%1.2-1.45-9%
Supplements & food40-55%2.0-2.82-4%

Read the table as a sanity check, not a target. If your variant cohort's 60-day repeat rate lands well below the band for your vertical, you have a retention problem the test readout didn't catch. If it lands above, the variant is doing something genuinely useful that the session window couldn't see.

Common pitfalls and how to avoid them

The biggest mistake is mixing cohorts. If you re-randomise returning visitors mid-test, or if your bucketing resets when someone clears cookies, your cohort tags drift and the long-term curves become noise. Pin the variant assignment to a stable identifier and audit it weekly.

The second mistake is reading cohort curves too early. Weeks one through three are dominated by the same buyers your primary test report already saw. The signal you came for sits in weeks four through twelve, and it takes statistical patience to wait for it — especially when the early curve looks like a win.

Run cohort analysis on shipped tests too

A cohort review of every test you shipped in the last six months is the cheapest CRO audit you'll ever run. You'll find two or three winners that quietly cost revenue, and rolling them back is pure upside with zero development cost.

Frequently asked

Frequently asked questions

Eight to twelve weeks post-assignment covers the repeat-purchase window for most online stores. Categories with longer purchase cycles — appliances, furniture, mattresses — need 16-24 weeks. Beauty and supplements can be read at 6-8 weeks because customers reorder fast.

Segmentation slices users by attributes (new vs returning, mobile vs desktop, channel). Cohort analysis slices by a starting timestamp and tracks the same group forward through time. You can segment cohorts — that's where the best insights live — but the cohort tag is always temporal.

Yes. Statistical significance only tells you the lift on your primary metric is real within the test window. It says nothing about whether that lift sustains, cannibalises future demand, or attracts lower-LTV buyers. Cohort analysis is the second question.

GA4 has a built-in cohort exploration, but it's limited to user-acquisition cohorts and a handful of metrics. For experiment cohorts you need to pass the variant ID as a user property and build the cohort view in BigQuery or in a dedicated experimentation tool. Metricuno does this automatically when it imports your GA4 history.

For repeat-purchase rate and orders per customer, aim for at least 1,500 visitors per variant per cohort. Below that, weekly retention curves are too noisy to separate signal from variance, and the long-tail metrics swing wildly on small denominators.

It's the long-window companion to your primary experiment analysis. The primary readout answers does this variant convert more sessions; the cohort readout answers does this variant build more valuable customers. You need both before you ship to 100% of traffic.

For experiments, always assignment date — that's when the treatment started. First-purchase cohorts are useful for retention analysis outside of testing, like measuring whether a new onboarding flow improves second-order rates, but they exclude non-buyers and bias the result.

That's fine and expected, especially with unequal traffic splits. Compare rates and per-visitor metrics, not absolutes. Avoid total revenue or total orders as cohort comparison metrics — they scale with cohort size and tell you nothing about user behavior.

Sometimes, yes. Variants that improve trust signals, product education, or post-purchase emails often look flat on session conversion but lift 60-day repeat rate by 5-10%. The cohort view is the only place those wins show up, and they're usually the most defensible ones.

Weekly is enough. The interesting movement happens between weeks four and eight, so daily checking creates more anxiety than insight. Set a weekly review cadence and a clear rule for when a cohort divergence is large enough to halt or roll back a shipped variant.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.