RICE Scoring

Metricuno

May 17, 2026

4 min read

RICE Scoring — RICE scoring ranks experiments by Reach × Impact × Confidence ÷ Effort. See the formula, a worked example, and benchmark scores for common CRO tests.

Quick answer

RICE scoring prioritizes experiments by weighing how many users a test reaches against its expected impact, your confidence in the prediction, and the effort required to ship it.

Definition

Experimentation

RICE Scoring

A prioritization formula that ranks experiments by Reach × Impact × Confidence ÷ Effort.

RICE scoring is a lightweight prioritization framework that ranks experiment ideas by a single composite score: Reach multiplied by Impact and Confidence, divided by Effort. It was popularized by Intercom's product team and has become a default tool for CRO specialists and product managers who need to triage a long backlog without long debates.

What makes RICE more rigorous than ICE or PIE is the explicit Reach term. By forcing you to estimate how many users actually encounter the test surface, RICE naturally promotes sitewide changes — homepage hero, global navigation, cart drawer — over experiments scoped to a thin segment like first-time visitors from a single paid campaign.

Also known as

RICE framework

RICE prioritization

RICE sits inside the broader practice of experiment prioritization, alongside scoring models like ICE and PIE. Where ICE asks only about Impact, Confidence, and Ease, RICE forces a fourth question: how many people will this test actually touch in a given period? That single addition changes which ideas rise to the top of the backlog.

The framework is deliberately rough. Each input is a best-guess estimate, not a measured value, and the output score is only meaningful when compared across ideas in the same backlog. Treat RICE as a ranking tool, not a forecast — a test with a RICE score of 120 is not 'twice as good' as one scoring 60, only ahead in the queue.

Formula

RICE = (Reach × Impact × Confidence) / Effort

Variables

Reach

Number of users who will encounter the test during a fixed window — typically one month or one quarter. Use a real traffic figure from GA4 or your analytics, not a guess.

Impact

Expected lift per user, on a coarse scale: 3 = massive, 2 = high, 1 = medium, 0.5 = low, 0.25 = minimal. Resist the urge to invent finer gradations.

Confidence

How sure you are about Reach and Impact, expressed as a percentage. 100% = backed by prior test or hard data, 80% = strong evidence, 50% = directional hunch.

Effort

Total person-months to design, build, QA, and ship the test. Include design, dev, and analyst time — not just engineering.

Worked example

A Shopify apparel store is choosing between two test ideas: redesigning the global product card (sitewide) versus adding a size-guide modal to one collection page.

Reach (monthly users seeing product cards): 80,000

Impact (expected lift level): 2 (high)

Confidence: 80%

Effort (person-months): 2

→ RICE = (80,000 × 2 × 0.8) / 2 = 64,000

The size-guide modal scores roughly (6,000 × 2 × 0.5) / 1 = 6,000 — an order of magnitude lower, driven almost entirely by the narrower reach. The product card redesign ships first.

Calibrate your team on the Impact scale before you score anything. If one PM treats '2' as a 5% conversion lift and another treats it as 15%, your rankings become noise. A 30-minute calibration session — score five past tests retrospectively — fixes this faster than any documentation.

Benchmark

Typical RICE scores for common CRO test ideas on a mid-size Shopify store (~100k monthly visitors)

Test idea	Reach (monthly)	Impact	Confidence	Effort (months)	RICE score
Homepage hero copy & image swap	95,000	1	70%	0.5	133,000
Sticky add-to-cart on PDP (mobile)	60,000	2	80%	1	96,000
Free-shipping threshold bar (sitewide)	100,000	1	90%	1	90,000
Product card redesign	80,000	2	80%	2	64,000
Express checkout button order	45,000	1	60%	0.5	54,000
Size guide on one collection	6,000	2	50%	1	6,000
Post-purchase upsell for repeat buyers	3,500	3	70%	1.5	4,900

Notice the pattern: niche-segment tests get punished even when their per-user impact is high. The post-purchase upsell scores low not because it's a bad idea — it might lift AOV meaningfully — but because only 3,500 users see it. That's RICE working as designed, and also its main weakness: it can starve high-leverage retention and loyalty tests if you don't run a separate prioritization track for them.

Frequently asked

Frequently asked questions

ICE scores Impact, Confidence, and Ease, each on a 1-10 scale, and averages them. RICE replaces Ease with Effort (in person-months) and adds Reach as a multiplier. The result: RICE is harder to game with optimistic 1-10 ratings, and it explicitly rewards tests that touch more users.

PIE (Potential, Importance, Ease) is WiderFunnel's framework and uses a 1-10 scale across all three inputs. It's faster but less precise. RICE produces a numeric estimate tied to real traffic figures, which makes it more defensible in stakeholder conversations but slower to fill out.

Pick one window — monthly or quarterly — and use it consistently across the whole backlog. Monthly is standard for high-velocity CRO teams running 2-4 tests at a time. The absolute number matters less than ranking ideas on the same scale.

Anchor to evidence sources: 100% means a prior winning test on your own site, 80% means strong third-party benchmarks plus qualitative data (session replays, surveys), 50% means a gut feel. If most of your backlog sits at 50%, you have a research problem, not a prioritization problem.

Partially. RICE will systematically rank segment tests below sitewide ones because Reach is smaller. Either accept that bias or run personalization on a separate track with its own scoring — many CRO teams maintain two backlogs for this reason.

Total person-months across every role needed to ship and analyze the test: design, frontend dev, QA, analyst review, and PM time. A test that takes 'one sprint' usually translates to 1-1.5 person-months when you add it all up. Underestimating Effort is the most common scoring error.

No. Score once when an idea enters the backlog, then rescore only when something material changes — new traffic data, a related test finishes, or a stakeholder reprioritizes a goal. Weekly rescoring burns time without changing the ranking.

Yes, but adjust Reach to match the channel — email list size, monthly ad impressions, app sessions. The formula is channel-agnostic. What matters is that every idea in the comparison uses the same Reach definition.

Calibrate the team by retroactively scoring 5-10 past tests, then comparing to actual results. Most teams discover they were rating Impact one full level too high. Pin a reference test on the wall: 'a 2 looks like this winner from Q2'.

Probably overkill. RICE shines when you have 15+ ideas competing for limited dev capacity. For low-velocity programs, a simple impact-versus-effort 2x2 matrix gives the same answer in a tenth of the time.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

RICE Scoring

RICE Scoring

Typical RICE scores for common CRO test ideas on a mid-size Shopify store (~100k monthly visitors)

Frequently asked questions

What's the difference between RICE and ICE scoring?

How is RICE different from PIE scoring?

What time window should I use for Reach?

How do I score Confidence without past test data?

Does RICE work for personalization or segment-specific tests?

What counts as 'Effort' in RICE?

Should I rescore the backlog every week?

Can I use RICE for non-CRO experiments like email or ads?

How do I avoid Impact-score inflation?

Is RICE the right framework if I only run 1-2 tests per quarter?

Test ideas before you ship them