Product Experiments

Product experiments test functional changes — new features, new flows, removed features — rather than visual tweaks. Stakes are higher and runtimes are longer because you're measuring deeper behavior.
Product Experiments
A/B tests on functional product changes — features, flows, or removals — rather than purely visual or copy tweaks.
Product experiments are controlled tests where the variant changes how the product actually works: a new checkout step, a different recommendation engine, a removed upsell, a rebuilt mobile filter. Unlike a button-color test, they alter the behavior the user has to learn, adopt, or trust.
Because the change is deeper, the measurement is harder. Effects take longer to surface, novelty and learning curves distort early data, and the metrics that matter usually sit further down the funnel — retention, repeat purchase, AOV — not just click-through. A product experiment is a commitment, not a quick win.
Product experiments sit inside the broader practice of feature experimentation, but they're the highest-stakes subset. A copy test runs for a week and impacts one funnel step. A product experiment can run for a month and impacts how the entire shopping experience feels.
The classic example on a Shopify store: replacing a multi-step checkout with a single-page checkout. The variant changes navigation, form behavior, payment-method order, and trust signals all at once. You're not testing a pixel — you're testing a product decision.
required_runtime_days = (sample_size_per_variant * 2) / daily_traffic
sample_size_per_variant
Sample size per variant
Visitors needed per arm to detect the minimum effect at 95% confidence and 80% power.
daily_traffic
Daily traffic into the test
Unique visitors per day eligible for the experiment.
required_runtime_days
Required runtime in days
Calendar days the test needs to run, before adding a buffer for weekly cycles.
A Shopify apparel store tests a new size-recommendation widget on product pages, targeting a lift from 2.4% to 2.7% conversion rate.
Sample size per variant: 31,000
Daily eligible traffic: 2,200
Variants (control + 1): 2
→ ~28 days
The test needs a full four weeks at minimum. Add another week to cover two complete weekly cycles, since weekend traffic skews younger and converts differently.
Notice the runtime: four to five weeks is normal for product experiments, not the 7-14 days a CRO team might quote for a headline test. The smaller the effect you need to detect, the longer the runtime — and product changes often move metrics by single-digit percentages, not 30% lifts.
Typical runtime and sample profile by experiment type
| Experiment type | Typical MDE | Runtime (weeks) | Primary metric |
|---|---|---|---|
| Headline / copy test | 10-20% | 1-2 | Click-through rate |
| Visual / layout test | 5-10% | 2-3 | Add-to-cart rate |
| Checkout flow change | 2-5% | 3-5 | Checkout completion |
| New feature rollout | 3-8% | 4-6 | Conversion + AOV |
| Feature removal | 1-3% | 4-8 | Revenue per visitor |
The metric column matters as much as the runtime. A copy test optimizes a click — a product experiment has to defend revenue per visitor, because a feature can lift add-to-cart but tank AOV, or improve conversion but hurt 30-day repeat rate. Always pair the primary metric with a guardrail.
Frequently asked questions
CRO tests are usually visual or copy changes on existing pages — they optimize what's already there. Product experiments change how a feature works, add new functionality, or remove existing functionality. The change is structural, not cosmetic.
Feature experimentation is the umbrella practice that covers any test involving product functionality. Product experiments are the high-stakes subset where the change is significant enough to alter user behavior at a deep level — new flows, removed features, or major rebuilds.
Plan for four to six weeks as a baseline. You need enough sample size to detect small effects, full weekly cycles to smooth out weekday/weekend variance, and a buffer for novelty effects to wash out in the first 7-10 days.
Returning users react to anything new — sometimes positively, sometimes negatively — before adjusting. For a button color, this washes out in days. For a redesigned checkout, the adjustment period can be two weeks. Ignore the first week of data when reading the result.
If retention or repeat purchase is the metric, yes — you need to follow users across sessions. For first-purchase impact, anonymous bucketing works fine, but use a persistent cookie or device fingerprint so a returning visitor stays in the same variant.
At minimum: revenue per visitor, AOV, refund rate, and support tickets. A feature that lifts conversion but causes a 15% spike in support volume isn't a win. Page load time is also worth tracking — new features often add weight.
Yes, and it's one of the most underused experiment types. Hold-out tests — where the control keeps the feature and the variant removes it — reveal which features are actually pulling weight. Many stores discover their upsell modal or quick-view drawer adds friction rather than revenue.
Use consistent assignment keyed on user ID or a persistent device cookie, not session ID. Exclude staff and bot traffic. If the feature is visible to non-bucketed users (e.g. via shared URLs), shipping it as a server-side flag rather than a client-side flicker is safer.
Resist the urge to extend indefinitely — that inflates false-positive rates. If the result is flat with a tight confidence interval, the feature genuinely doesn't move the metric. If the interval is still wide, you under-powered the test and should redesign with a larger MDE or a higher-traffic surface.
For functional changes, usually yes — someone has to build the variant. But the orchestration (assignment, tracking, analysis) can run through a no-code experimentation layer, which is how most Shopify and WooCommerce teams keep velocity high without queueing every test behind a sprint.
Test ideas before you ship them
Run unlimited A/B tests, attach hypotheses to outcomes, and build a searchable archive of what works — and what doesn't.