Canary Releases

Metricuno

May 17, 2026

4 min read

Canary Releases — Canary releases route a small slice of traffic to a new version before full rollout. See the math, typical stages, and how it differs from feature flags.

Quick answer

A canary release sends a small percentage of production traffic to a new version while everyone else stays on the stable build. Here's how the math, stages, and trade-offs work.

Definition

Experimentation & Deployment

Canary Releases

A deployment pattern that routes a small slice of production traffic to a new version while the rest stays on the stable one.

A canary release ships a new version of your storefront, checkout, or backend service to a deliberately small fraction of live traffic — typically 1% to 5% at first — while the remaining users stay on the proven version. You watch error rates, latency, and conversion on the canary slice, then either expand the rollout in stages or roll back instantly if something breaks.

The pattern is infrastructure-level: a router, load balancer, or CDN decides who sees which build, and the application code is the same for everyone. That distinguishes it from feature flags, which gate behaviour from inside the code. Canary releases are a core technique inside feature experimentation programmes when the risk you're managing is deployment risk, not hypothesis risk.

Also known as

Canary deployment

Phased rollout

Progressive delivery

The name comes from the canary-in-a-coal-mine analogy: a small, sensitive sample that surfaces danger before it reaches the rest of the system. In e-commerce, the cost of a bad deploy is measured in failed checkouts and refund tickets, so limiting blast radius is the whole point.

A canary is not an A/B test. You're not trying to learn which version converts better — you assume the new version is at least as good and you're verifying it doesn't break in production. The decision criterion is regression detection, not statistical significance on a business KPI.

Formula

expected_incidents = traffic_share * baseline_error_rate * affected_sessions

Variables

traffic_share

Canary traffic share

Fraction of production traffic routed to the new version (e.g. 0.05 for 5%).

baseline_error_rate

Worst-case error rate

Pessimistic estimate of how often the new version fails a session (e.g. 0.02 for 2%).

affected_sessions

Sessions in window

Total production sessions during the canary window.

Worked example

A Shopify apparel store deploys a new checkout build behind a 5% canary for a 4-hour window covering 20,000 sessions. They assume a worst-case error rate of 2% if a regression slips through.

Canary traffic share: 0.05

Worst-case error rate: 0.02

Sessions in window: 20,000

→ 20 affected sessions

By keeping the canary at 5%, the team caps worst-case exposure to roughly 20 broken sessions before automated rollback triggers — versus 400 if the same regression went out to 100% of traffic.

Most teams progress through a fixed ladder of traffic shares with a hold time at each step. The shape of the ladder depends on traffic volume: a store doing 50k sessions a day can move through stages in hours, while a smaller store may need a full day per step to see meaningful signal.

Benchmark

Typical canary rollout ladder for an e-commerce storefront

Stage	Traffic share	Hold time	Primary signal watched
Internal	0% (staff only)	1-4 hours	Smoke test, console errors
Canary 1	1%	2-6 hours	Error rate, p95 latency
Canary 2	5%	4-12 hours	Checkout completion rate
Ramp	25%	12-24 hours	Revenue per session vs control
Majority	50%	12-24 hours	Full funnel KPIs
Full	100%	Steady state	Post-deploy monitoring only

Two pitfalls show up repeatedly. First, the canary slice has to be representative — routing only logged-out desktop users on a Tuesday morning will miss the bug that fires on iOS Safari at checkout. Second, the hold time has to be long enough to cover a real purchase cycle, including any abandoned-cart email loop you want validated.

Frequently asked

Frequently asked questions

A canary release is infrastructure-level: a router sends some users to a new build of the whole application. A feature flag is code-level: every user runs the same build, but a conditional inside the code shows or hides a specific feature. You often use both — canary for deployment safety, flags for feature targeting.

No. An A/B test measures whether one variant converts better than another and runs to statistical significance. A canary release assumes the new version is acceptable and watches for regressions in error rate, latency, and checkout success. Same routing mechanism, different decision criteria.

Start at 1% for high-traffic stores and 5% for smaller stores where 1% wouldn't generate enough sessions to detect problems. The goal is to expose enough users to surface real errors within a few hours without putting more than a tiny fraction of revenue at risk.

Long enough to see a full session lifecycle on the new build, including add-to-cart and checkout. For most online stores that's 2-6 hours for early stages and 12-24 hours for later stages, so you cover both peak and off-peak traffic patterns.

Shopify's hosted checkout limits how much you can canary the checkout itself, but you can canary theme changes, app-rendered components, and storefront API integrations using traffic-splitting at the CDN or via a tag-manager rule. Backend services running outside Shopify can canary normally.

At minimum: server error rate (5xx) above the baseline, p95 latency degradation greater than 20-30%, and a drop in checkout completion rate beyond noise. Set thresholds before the deploy so the rollback decision isn't argued in the moment.

Yes, sticky-random by user or session ID. If you route by geography, device, or login state you'll bias the canary toward a subgroup and miss regressions affecting the rest. The exception is deliberate dark-launches to internal staff before the 1% stage.

Canary releases are the deployment-safety layer underneath feature experimentation. You canary the build to make sure nothing is broken, then run feature-flag experiments inside that build to test hypotheses. One protects uptime, the other protects learning velocity.

A traffic-splitter (CDN rule, load balancer, ingress controller, or feature-flag platform with deployment modes), real-time error and latency monitoring, and a dashboard that compares canary vs stable on your key funnel metrics. Many teams start with Cloudflare or Fastly rules plus their existing analytics.

Documentation, marketing-page copy, and other changes with no runtime behaviour are fine to ship directly. Anything touching cart, checkout, payment, inventory, or pricing logic should go through a canary — the cost of a regression in those paths almost always exceeds the cost of an extra hour of rollout.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

Canary Releases

Canary Releases

Typical canary rollout ladder for an e-commerce storefront

Frequently asked questions

What's the difference between a canary release and a feature flag?

Is a canary release the same as an A/B test?

What percentage of traffic should the first canary stage get?

How long should each canary stage run?

Can I run canary releases on Shopify?

What metrics should trigger an automatic rollback?

Does a canary slice need to be a random sample?

How does canary fit into broader feature experimentation?

What tooling do I need to run canaries?

When should I skip canary and deploy straight to 100%?

Test ideas before you ship them