Feature Flags

Metricuno

May 17, 2026

4 min read

Feature Flags — Feature flags are boolean toggles that gate code paths to release features gradually. See how they enable canary releases, A/B tests, and safer rollouts.

Quick answer

Feature flags are boolean toggles that gate code paths so a feature ships to a subset of users — the infrastructure behind canary releases, progressive rollouts, and product A/B tests.

Definition

Experimentation infrastructure

Feature Flags

Boolean toggles in code that decide which users see a new feature, enabling gradual rollouts and live A/B tests.

A feature flag is a conditional in your codebase — usually a simple if-statement — that checks a remote configuration to decide whether a given user gets the new code path or the old one. The flag's value can be flipped without a redeploy, and it can be targeted by user ID, country, device, store segment, or a random bucket.

That decoupling of deploy from release is the whole point. Engineering can merge half-finished work behind an off flag, product can turn a checkout change on for 5% of Shopify traffic at 10am, and if conversion dips you flip it back in seconds. Feature flags are the plumbing under canary releases, progressive rollouts, and product-level A/B tests.

Also known as

feature toggles

feature switches

release toggles

There are four flag types worth knowing apart. Release flags hide unfinished work in production. Experiment flags split traffic for A/B tests. Ops flags act as kill switches for risky subsystems (payment providers, recommendation engines). Permission flags gate features to specific accounts or plans.

For an online store, the highest-leverage uses are usually the boring ones: roll a new product-detail-page template to 10% of mobile traffic, test a new shipping-threshold banner on returning visitors only, or instantly disable a third-party review widget when it tanks Largest Contentful Paint. Done well, feature flags turn releases from a ceremony into a dial.

Formula

Blast Radius = Total Active Users × Rollout %

Variables

Total Active Users

Active users in the window

Daily or weekly active shoppers exposed to the code path.

Rollout %

Flag exposure percentage

Share of users the flag currently routes to the new variant, expressed as a decimal.

Worked example

A Shopify apparel store with 80,000 weekly active visitors enables a new express-checkout flow for 5% of mobile traffic. Mobile is 70% of sessions.

Total Active Users (weekly mobile): 56,000

Rollout %: 0.05

→ 2,800 users exposed

If the new checkout breaks for everyone in the variant, the worst-case impact this week is 2,800 sessions — small enough to detect a problem from conversion telemetry without sinking the quarter.

That blast-radius calculation is what makes flags safer than a hard release. You pick a percentage you can afford to lose, watch your guardrail metrics (conversion rate, checkout errors, page speed), then ramp the flag up in stages — typically 1% → 5% → 25% → 50% → 100% over a few days.

Benchmark

Typical feature-flag rollout patterns by use case

Use case	Initial exposure	Ramp duration	Primary guardrail
Canary release (low-risk UI)	5%	2-3 days	Error rate
Checkout / payment change	1%	7-14 days	Conversion rate, checkout errors
Product A/B test	50%	Hold until significance (2-4 weeks)	Primary KPI lift
Kill switch / ops flag	100% on by default	Instant flip if needed	Latency, error rate
Permission / entitlement flag	Targeted segment only	Permanent	N/A — not a release

Where flags get messy is the long tail. Every flag is a branch in your code, and unmaintained ones rot — engineers stop trusting which branch is live, QA coverage doubles, and a 'temporary' release flag from 18 months ago ends up gating a payment retry. Treat flag cleanup as part of the definition of done, and review your flag inventory quarterly.

Frequently asked

Feature flags FAQ

Feature flags are the infrastructure; A/B testing is one thing you do with them. A flag decides which code path a user gets — an A/B test additionally randomises that decision and measures the difference in outcomes. Every product A/B test runs on top of flags, but plenty of flags (kill switches, permission gates, dark launches) aren't experiments.

Feature experimentation is the practice of validating product changes with data; feature flags are how you actually deliver those experiments to a controlled slice of users. Without flags you can't randomise exposure, hold a control group, or roll back losers — so flags are a prerequisite for any serious experimentation programme.

A well-built flag SDK adds a few kilobytes and resolves flag values either at the edge or from a cached local bootstrap, so the runtime cost is usually under 10ms. The risk is third-party SDKs that fetch flag config synchronously on page load — those can add 100-300ms and should be loaded asynchronously or moved server-side.

Server-side is safer for anything that affects pricing, inventory, or checkout — the client can't be trusted with flags that have business consequences. Client-side is fine for visual variants, copy changes, and front-end experiments where the worst case is showing the wrong button colour.

There's no hard limit, but most teams start to feel pain past 100 active flags because the combinatorics of which-flag-is-on-for-whom become untestable. A healthy cadence is to retire flags within 30 days of a feature reaching 100%, and to keep permanent flags (kill switches, entitlements) explicitly labelled so they don't get confused with release debt.

Yes — a basic flag system is a config file plus an if-statement, and many teams start there. You'll outgrow it once you need percentage rollouts, user targeting, audit logs, or experiment statistics. At that point the build-vs-buy maths usually favours a managed service, because the infrastructure work compounds quickly.

Cached HTML is the main gotcha: if your CDN serves the same page to everyone, a client-side flag flip will look stale until cache TTL expires. Solutions include varying cache keys by flag bucket, rendering flagged variants at the edge, or pushing the decision into a client-side hydration step that runs after the cached shell loads.

A kill switch is a permanent flag that defaults to on and exists so you can disable a subsystem instantly when something breaks — third-party scripts, recommendation engines, expensive API calls. Release flags are temporary, default off, and get removed once the feature is fully rolled out. Mixing the two categories is how flag debt accumulates.

Confirm the flag has been at 100% (or 0%) for at least two weeks across all environments, search the codebase for every reference, remove the branch that's no longer in use, then delete the flag definition itself. Many flag platforms surface stale-flag reports — treat them like a linter warning, not a suggestion.

They can, in two ways. Client-side flags leak the existence of unreleased features to anyone who inspects the JavaScript bundle, so don't gate sensitive functionality on the client. And flag platforms themselves become high-privilege systems — an attacker who flips your payment-provider kill switch could disable checkout, so audit logs and role-based access are non-negotiable.

Test ideas before you ship them

Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.

Launch your first experiment

Feature Flags

Feature Flags

Typical feature-flag rollout patterns by use case

Feature flags FAQ

What's the difference between feature flags and A/B testing?

How do feature flags relate to feature experimentation?

Will adding a feature-flag SDK slow down my Shopify store?

Should flag evaluation happen client-side or server-side?

How many feature flags is too many?

Can I run feature flags without a third-party tool?

How do feature flags work with caching and CDNs?

What's a kill switch and how is it different from a release flag?

How do I clean up old feature flags safely?

Do feature flags introduce security risks?

Test ideas before you ship them