Canary Releases

A canary release sends a small percentage of production traffic to a new version while everyone else stays on the stable build. Here's how the math, stages, and trade-offs work.
Canary Releases
A deployment pattern that routes a small slice of production traffic to a new version while the rest stays on the stable one.
A canary release ships a new version of your storefront, checkout, or backend service to a deliberately small fraction of live traffic — typically 1% to 5% at first — while the remaining users stay on the proven version. You watch error rates, latency, and conversion on the canary slice, then either expand the rollout in stages or roll back instantly if something breaks.
The pattern is infrastructure-level: a router, load balancer, or CDN decides who sees which build, and the application code is the same for everyone. That distinguishes it from feature flags, which gate behaviour from inside the code. Canary releases are a core technique inside feature experimentation programmes when the risk you're managing is deployment risk, not hypothesis risk.
The name comes from the canary-in-a-coal-mine analogy: a small, sensitive sample that surfaces danger before it reaches the rest of the system. In e-commerce, the cost of a bad deploy is measured in failed checkouts and refund tickets, so limiting blast radius is the whole point.
A canary is not an A/B test. You're not trying to learn which version converts better — you assume the new version is at least as good and you're verifying it doesn't break in production. The decision criterion is regression detection, not statistical significance on a business KPI.
expected_incidents = traffic_share * baseline_error_rate * affected_sessions
traffic_share
Canary traffic share
Fraction of production traffic routed to the new version (e.g. 0.05 for 5%).
baseline_error_rate
Worst-case error rate
Pessimistic estimate of how often the new version fails a session (e.g. 0.02 for 2%).
affected_sessions
Sessions in window
Total production sessions during the canary window.
A Shopify apparel store deploys a new checkout build behind a 5% canary for a 4-hour window covering 20,000 sessions. They assume a worst-case error rate of 2% if a regression slips through.
Canary traffic share: 0.05
Worst-case error rate: 0.02
Sessions in window: 20,000
→ 20 affected sessions
By keeping the canary at 5%, the team caps worst-case exposure to roughly 20 broken sessions before automated rollback triggers — versus 400 if the same regression went out to 100% of traffic.
Most teams progress through a fixed ladder of traffic shares with a hold time at each step. The shape of the ladder depends on traffic volume: a store doing 50k sessions a day can move through stages in hours, while a smaller store may need a full day per step to see meaningful signal.
Typical canary rollout ladder for an e-commerce storefront
| Stage | Traffic share | Hold time | Primary signal watched |
|---|---|---|---|
| Internal | 0% (staff only) | 1-4 hours | Smoke test, console errors |
| Canary 1 | 1% | 2-6 hours | Error rate, p95 latency |
| Canary 2 | 5% | 4-12 hours | Checkout completion rate |
| Ramp | 25% | 12-24 hours | Revenue per session vs control |
| Majority | 50% | 12-24 hours | Full funnel KPIs |
| Full | 100% | Steady state | Post-deploy monitoring only |
Two pitfalls show up repeatedly. First, the canary slice has to be representative — routing only logged-out desktop users on a Tuesday morning will miss the bug that fires on iOS Safari at checkout. Second, the hold time has to be long enough to cover a real purchase cycle, including any abandoned-cart email loop you want validated.
Frequently asked questions
A canary release is infrastructure-level: a router sends some users to a new build of the whole application. A feature flag is code-level: every user runs the same build, but a conditional inside the code shows or hides a specific feature. You often use both — canary for deployment safety, flags for feature targeting.
No. An A/B test measures whether one variant converts better than another and runs to statistical significance. A canary release assumes the new version is acceptable and watches for regressions in error rate, latency, and checkout success. Same routing mechanism, different decision criteria.
Start at 1% for high-traffic stores and 5% for smaller stores where 1% wouldn't generate enough sessions to detect problems. The goal is to expose enough users to surface real errors within a few hours without putting more than a tiny fraction of revenue at risk.
Long enough to see a full session lifecycle on the new build, including add-to-cart and checkout. For most online stores that's 2-6 hours for early stages and 12-24 hours for later stages, so you cover both peak and off-peak traffic patterns.
Shopify's hosted checkout limits how much you can canary the checkout itself, but you can canary theme changes, app-rendered components, and storefront API integrations using traffic-splitting at the CDN or via a tag-manager rule. Backend services running outside Shopify can canary normally.
At minimum: server error rate (5xx) above the baseline, p95 latency degradation greater than 20-30%, and a drop in checkout completion rate beyond noise. Set thresholds before the deploy so the rollback decision isn't argued in the moment.
Yes, sticky-random by user or session ID. If you route by geography, device, or login state you'll bias the canary toward a subgroup and miss regressions affecting the rest. The exception is deliberate dark-launches to internal staff before the 1% stage.
Canary releases are the deployment-safety layer underneath feature experimentation. You canary the build to make sure nothing is broken, then run feature-flag experiments inside that build to test hypotheses. One protects uptime, the other protects learning velocity.
A traffic-splitter (CDN rule, load balancer, ingress controller, or feature-flag platform with deployment modes), real-time error and latency monitoring, and a dashboard that compares canary vs stable on your key funnel metrics. Many teams start with Cloudflare or Fastly rules plus their existing analytics.
Documentation, marketing-page copy, and other changes with no runtime behaviour are fine to ship directly. Anything touching cart, checkout, payment, inventory, or pricing logic should go through a canary — the cost of a regression in those paths almost always exceeds the cost of an extra hour of rollout.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.