Learning Systems

Q: What goes into a single test write-up?

Hypothesis, variant description, primary and guardrail metrics, segment cuts (mobile vs desktop, new vs returning), result with confidence level, and one tagged insight in plain English. Aim for 5-10 minutes per test, not 45.

Q: Why document losing tests at all?

Losses tell you what your buyers don't respond to, which is often more actionable than wins. A failed urgency-banner test on a beauty store is evidence that the audience doesn't buy on scarcity — that shapes the next ten hypotheses, not just the one that failed.

Q: How is a learning system different from an experimentation strategy?

Experimentation strategy decides what to test and why. The learning system captures what each test taught and makes that searchable. Strategy is the plan; the learning system is the memory the plan draws on.

Q: What's the minimum viable learning system?

A single spreadsheet or Airtable with seven columns: date, hypothesis, area of site, result, lift, segment notes, insight tag. That's enough to outperform 80% of CRO programs that have nothing.

Q: How should we tag insights so they're searchable later?

Use two tag dimensions: page area (PDP, cart, checkout, homepage) and psychological lever (social proof, urgency, friction reduction, trust, clarity). That two-axis grid is enough to surface relevant prior tests for any new hypothesis.

Q: Who owns the learning system?

Whoever runs the weekly experimentation review — usually the CRO lead or head of ecommerce. Ownership matters more than role; if it's everyone's responsibility, it becomes nobody's.

Q: Do inconclusive tests belong in the repository?

Yes, and they're often the most useful entries. An inconclusive result tells you the effect size was smaller than your traffic could detect — which is itself a finding about the hypothesis's ceiling and your testing constraints.

Q: How do we stop the same hypothesis from being re-tested?

Make repository search a required step before any new hypothesis enters the backlog. Two minutes of search at intake saves four weeks of redundant testing later. The tag structure is what makes that search fast.

Q: How long should we keep test results before they go stale?

Insights about psychological levers (does social proof work for our buyers?) stay valid for years. Insights about specific creative or copy go stale in 6-12 months as the site, traffic mix, and seasonality shift. Tag accordingly.

Q: Can AI help summarise and surface past learnings?

Yes — once you have a structured repository. A well-tagged corpus of 50+ test write-ups becomes a powerful input for AI-generated hypotheses, because the model can reason over what's already been proven or disproven on your specific store.

Metricuno

May 17, 2026

3 min read

Learning Systems — Learning systems capture what every A/B test taught — winners and losers. Here's how mature CRO programs compound knowledge instead of re-learning it.

Quick answer

A learning system is the documentation and review muscle that captures what every test taught — even the losing ones — so your CRO program compounds knowledge instead of repeating it.

Definition

Experimentation

Learning System

The process and documentation layer that captures what every experiment taught — wins, losses, and inconclusive results — so the team compounds knowledge over time.

A learning system is the organisational muscle that turns individual A/B tests into durable knowledge. It's the repository, the review ritual, and the tagging structure that lets you answer questions like "have we tested social proof on PDPs before?" in 30 seconds rather than re-running an experiment from 14 months ago.

It sits inside your broader experimentation strategy as the part that decides whether your program gets smarter every quarter or stays stuck re-litigating the same hypotheses. Most teams ship the tests. Few capture what the tests meant. The gap between those two states is where ROI on CRO either compounds or evaporates.

Also known as

experiment knowledge base

test learning repository

insight library

The honest reason learning systems get neglected: losing tests feel bad to document, and inconclusive tests feel like nothing happened. So teams skip the write-up, archive the experiment, and move on. Six months later someone proposes the same hypothesis.

A loss is data. An inconclusive result is data about your traffic volume and effect size. The teams that treat both as first-class evidence end up with hypothesis backlogs that get sharper each quarter — because they're filtering ideas through what the store has already proven true or false about its own buyers.

Formula

KRR = (documented_tests / total_tests_shipped) * 100

Variables

KRR

Knowledge Retention Rate

Share of shipped experiments with a structured post-test write-up logged in the repository.

documented_tests

Documented tests

Tests with hypothesis, result, segment cut, and a tagged insight recorded.

total_tests_shipped

Total tests shipped

Every experiment concluded in the period, including losers and inconclusive results.

Worked example

A Shopify apparel brand ran 24 experiments last quarter. The team wrote up 9 of them — mostly the winners. The other 15 (losses and inconclusive) sit in the A/B tool with no notes.

Documented tests: 9

Total tests shipped: 24

→ KRR = 37.5%

Below the 70% threshold a mature program should hit. The team is effectively throwing away two-thirds of what their traffic told them — and will almost certainly re-propose hypotheses that were already tested.

KRR is a leading indicator. It doesn't measure whether your insights are good — it measures whether you have any. Teams that drag KRR above 80% report a noticeable shift within two quarters: hypothesis backlogs get pre-filtered, and "we tried that" becomes a sentence backed by a link instead of someone's memory.

Benchmark

What a learning system looks like at each maturity stage

Stage	Tests/quarter	Knowledge retention rate	Time to find prior result	Hypotheses sourced from past tests
Ad-hoc	4-8	10-30%	Hours or never	<10%
Documented	8-15	40-60%	15-30 min	20-30%
Structured	15-25	70-85%	Under 5 min	40-55%
Compounding	25+	85-95%	Under 1 min	60%+

The tooling matters less than the ritual. A shared Notion database with consistent tags will outperform a fancy CRO platform that nobody updates. What separates the Structured and Compounding stages isn't software — it's a 30-minute weekly review where every concluded test gets logged before it can be archived.

Frequently asked

Learning systems FAQ