RICE Scoring

RICE scoring prioritizes experiments by weighing how many users a test reaches against its expected impact, your confidence in the prediction, and the effort required to ship it.
RICE Scoring
A prioritization formula that ranks experiments by Reach × Impact × Confidence ÷ Effort.
RICE scoring is a lightweight prioritization framework that ranks experiment ideas by a single composite score: Reach multiplied by Impact and Confidence, divided by Effort. It was popularized by Intercom's product team and has become a default tool for CRO specialists and product managers who need to triage a long backlog without long debates.
What makes RICE more rigorous than ICE or PIE is the explicit Reach term. By forcing you to estimate how many users actually encounter the test surface, RICE naturally promotes sitewide changes — homepage hero, global navigation, cart drawer — over experiments scoped to a thin segment like first-time visitors from a single paid campaign.
RICE sits inside the broader practice of experiment prioritization, alongside scoring models like ICE and PIE. Where ICE asks only about Impact, Confidence, and Ease, RICE forces a fourth question: how many people will this test actually touch in a given period? That single addition changes which ideas rise to the top of the backlog.
The framework is deliberately rough. Each input is a best-guess estimate, not a measured value, and the output score is only meaningful when compared across ideas in the same backlog. Treat RICE as a ranking tool, not a forecast — a test with a RICE score of 120 is not 'twice as good' as one scoring 60, only ahead in the queue.
RICE = (Reach × Impact × Confidence) / Effort
Reach
Reach
Number of users who will encounter the test during a fixed window — typically one month or one quarter. Use a real traffic figure from GA4 or your analytics, not a guess.
Impact
Impact
Expected lift per user, on a coarse scale: 3 = massive, 2 = high, 1 = medium, 0.5 = low, 0.25 = minimal. Resist the urge to invent finer gradations.
Confidence
Confidence
How sure you are about Reach and Impact, expressed as a percentage. 100% = backed by prior test or hard data, 80% = strong evidence, 50% = directional hunch.
Effort
Effort
Total person-months to design, build, QA, and ship the test. Include design, dev, and analyst time — not just engineering.
A Shopify apparel store is choosing between two test ideas: redesigning the global product card (sitewide) versus adding a size-guide modal to one collection page.
Reach (monthly users seeing product cards): 80,000
Impact (expected lift level): 2 (high)
Confidence: 80%
Effort (person-months): 2
→ RICE = (80,000 × 2 × 0.8) / 2 = 64,000
The size-guide modal scores roughly (6,000 × 2 × 0.5) / 1 = 6,000 — an order of magnitude lower, driven almost entirely by the narrower reach. The product card redesign ships first.
Calibrate your team on the Impact scale before you score anything. If one PM treats '2' as a 5% conversion lift and another treats it as 15%, your rankings become noise. A 30-minute calibration session — score five past tests retrospectively — fixes this faster than any documentation.
Typical RICE scores for common CRO test ideas on a mid-size Shopify store (~100k monthly visitors)
| Test idea | Reach (monthly) | Impact | Confidence | Effort (months) | RICE score |
|---|---|---|---|---|---|
| Homepage hero copy & image swap | 95,000 | 1 | 70% | 0.5 | 133,000 |
| Sticky add-to-cart on PDP (mobile) | 60,000 | 2 | 80% | 1 | 96,000 |
| Free-shipping threshold bar (sitewide) | 100,000 | 1 | 90% | 1 | 90,000 |
| Product card redesign | 80,000 | 2 | 80% | 2 | 64,000 |
| Express checkout button order | 45,000 | 1 | 60% | 0.5 | 54,000 |
| Size guide on one collection | 6,000 | 2 | 50% | 1 | 6,000 |
| Post-purchase upsell for repeat buyers | 3,500 | 3 | 70% | 1.5 | 4,900 |
Notice the pattern: niche-segment tests get punished even when their per-user impact is high. The post-purchase upsell scores low not because it's a bad idea — it might lift AOV meaningfully — but because only 3,500 users see it. That's RICE working as designed, and also its main weakness: it can starve high-leverage retention and loyalty tests if you don't run a separate prioritization track for them.
Frequently asked questions
ICE scores Impact, Confidence, and Ease, each on a 1-10 scale, and averages them. RICE replaces Ease with Effort (in person-months) and adds Reach as a multiplier. The result: RICE is harder to game with optimistic 1-10 ratings, and it explicitly rewards tests that touch more users.
PIE (Potential, Importance, Ease) is WiderFunnel's framework and uses a 1-10 scale across all three inputs. It's faster but less precise. RICE produces a numeric estimate tied to real traffic figures, which makes it more defensible in stakeholder conversations but slower to fill out.
Pick one window — monthly or quarterly — and use it consistently across the whole backlog. Monthly is standard for high-velocity CRO teams running 2-4 tests at a time. The absolute number matters less than ranking ideas on the same scale.
Anchor to evidence sources: 100% means a prior winning test on your own site, 80% means strong third-party benchmarks plus qualitative data (session replays, surveys), 50% means a gut feel. If most of your backlog sits at 50%, you have a research problem, not a prioritization problem.
Partially. RICE will systematically rank segment tests below sitewide ones because Reach is smaller. Either accept that bias or run personalization on a separate track with its own scoring — many CRO teams maintain two backlogs for this reason.
Total person-months across every role needed to ship and analyze the test: design, frontend dev, QA, analyst review, and PM time. A test that takes 'one sprint' usually translates to 1-1.5 person-months when you add it all up. Underestimating Effort is the most common scoring error.
No. Score once when an idea enters the backlog, then rescore only when something material changes — new traffic data, a related test finishes, or a stakeholder reprioritizes a goal. Weekly rescoring burns time without changing the ranking.
Yes, but adjust Reach to match the channel — email list size, monthly ad impressions, app sessions. The formula is channel-agnostic. What matters is that every idea in the comparison uses the same Reach definition.
Calibrate the team by retroactively scoring 5-10 past tests, then comparing to actual results. Most teams discover they were rating Impact one full level too high. Pin a reference test on the wall: 'a 2 looks like this winner from Q2'.
Probably overkill. RICE shines when you have 15+ ideas competing for limited dev capacity. For low-velocity programs, a simple impact-versus-effort 2x2 matrix gives the same answer in a tenth of the time.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.