Learning Systems

Metricuno
May 17, 2026
3 min read
Learning Systems — Learning systems capture what every A/B test taught — winners and losers. Here's how mature CRO programs compound knowledge instead of re-learning it.
Quick answer

A learning system is the documentation and review muscle that captures what every test taught — even the losing ones — so your CRO program compounds knowledge instead of repeating it.

Definition
Experimentation

Learning System

The process and documentation layer that captures what every experiment taught — wins, losses, and inconclusive results — so the team compounds knowledge over time.

A learning system is the organisational muscle that turns individual A/B tests into durable knowledge. It's the repository, the review ritual, and the tagging structure that lets you answer questions like "have we tested social proof on PDPs before?" in 30 seconds rather than re-running an experiment from 14 months ago.

It sits inside your broader experimentation strategy as the part that decides whether your program gets smarter every quarter or stays stuck re-litigating the same hypotheses. Most teams ship the tests. Few capture what the tests meant. The gap between those two states is where ROI on CRO either compounds or evaporates.

Also known as
experiment knowledge base
test learning repository
insight library

The honest reason learning systems get neglected: losing tests feel bad to document, and inconclusive tests feel like nothing happened. So teams skip the write-up, archive the experiment, and move on. Six months later someone proposes the same hypothesis.

A loss is data. An inconclusive result is data about your traffic volume and effect size. The teams that treat both as first-class evidence end up with hypothesis backlogs that get sharper each quarter — because they're filtering ideas through what the store has already proven true or false about its own buyers.

Formula

KRR = (documented_tests / total_tests_shipped) * 100

Variables

KRR

Knowledge Retention Rate

Share of shipped experiments with a structured post-test write-up logged in the repository.

documented_tests

Documented tests

Tests with hypothesis, result, segment cut, and a tagged insight recorded.

total_tests_shipped

Total tests shipped

Every experiment concluded in the period, including losers and inconclusive results.

Worked example

A Shopify apparel brand ran 24 experiments last quarter. The team wrote up 9 of them — mostly the winners. The other 15 (losses and inconclusive) sit in the A/B tool with no notes.

Documented tests: 9

Total tests shipped: 24

KRR = 37.5%

Below the 70% threshold a mature program should hit. The team is effectively throwing away two-thirds of what their traffic told them — and will almost certainly re-propose hypotheses that were already tested.

KRR is a leading indicator. It doesn't measure whether your insights are good — it measures whether you have any. Teams that drag KRR above 80% report a noticeable shift within two quarters: hypothesis backlogs get pre-filtered, and "we tried that" becomes a sentence backed by a link instead of someone's memory.

Benchmark

What a learning system looks like at each maturity stage

StageTests/quarterKnowledge retention rateTime to find prior resultHypotheses sourced from past tests
Ad-hoc4-810-30%Hours or never<10%
Documented8-1540-60%15-30 min20-30%
Structured15-2570-85%Under 5 min40-55%
Compounding25+85-95%Under 1 min60%+

The tooling matters less than the ritual. A shared Notion database with consistent tags will outperform a fancy CRO platform that nobody updates. What separates the Structured and Compounding stages isn't software — it's a 30-minute weekly review where every concluded test gets logged before it can be archived.

Frequently asked

Learning systems FAQ

Hypothesis, variant description, primary and guardrail metrics, segment cuts (mobile vs desktop, new vs returning), result with confidence level, and one tagged insight in plain English. Aim for 5-10 minutes per test, not 45.

Losses tell you what your buyers don't respond to, which is often more actionable than wins. A failed urgency-banner test on a beauty store is evidence that the audience doesn't buy on scarcity — that shapes the next ten hypotheses, not just the one that failed.

Experimentation strategy decides what to test and why. The learning system captures what each test taught and makes that searchable. Strategy is the plan; the learning system is the memory the plan draws on.

A single spreadsheet or Airtable with seven columns: date, hypothesis, area of site, result, lift, segment notes, insight tag. That's enough to outperform 80% of CRO programs that have nothing.

Use two tag dimensions: page area (PDP, cart, checkout, homepage) and psychological lever (social proof, urgency, friction reduction, trust, clarity). That two-axis grid is enough to surface relevant prior tests for any new hypothesis.

Whoever runs the weekly experimentation review — usually the CRO lead or head of ecommerce. Ownership matters more than role; if it's everyone's responsibility, it becomes nobody's.

Yes, and they're often the most useful entries. An inconclusive result tells you the effect size was smaller than your traffic could detect — which is itself a finding about the hypothesis's ceiling and your testing constraints.

Make repository search a required step before any new hypothesis enters the backlog. Two minutes of search at intake saves four weeks of redundant testing later. The tag structure is what makes that search fast.

Insights about psychological levers (does social proof work for our buyers?) stay valid for years. Insights about specific creative or copy go stale in 6-12 months as the site, traffic mix, and seasonality shift. Tag accordingly.

Yes — once you have a structured repository. A well-tagged corpus of 50+ test write-ups becomes a powerful input for AI-generated hypotheses, because the model can reason over what's already been proven or disproven on your specific store.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.