Exporting 24 Months Of Shopify Orders To Seed A Repeat-Purchase Curve

Metricuno
May 28, 2026
7 min read
Exporting 24 Months Of Shopify Orders To Seed A Repeat-Purchase Curve — How to pull 24 months of Shopify orders, normalize customers, and compute a repeat-purchase curve on day one — without relying on GA4 event history.
Quick answer

A practical walkthrough for exporting two years of Shopify orders and turning them into a clean repeat-purchase curve, bypassing GA4 when historical event data is unreliable.

Quick answer

Export orders for the last 24 months from Shopify Admin (Orders → Export → CSV, filtered by date) or pull /admin/api/2024-01/orders.json with status=any. Normalize on customer_id (fall back to lowercase email), bucket orders by months-since-first-order per customer, and you have a repeat-purchase curve in an afternoon — no GA4 required.

Definition
Retention analytics

Exporting 24 months of Shopify orders to seed a repeat-purchase curve

Pulling two years of Shopify order history and reshaping it into a cohort-by-month repeat-purchase curve, independent of GA4.

This is a day-one analytics move when GA4 history is missing, broken, or sampled: you go directly to the system of record — Shopify's orders table — and rebuild the retention picture from transactions. Twenty-four months is the practical floor because you need at least one full year of cohorts plus a year of follow-up to see whether repeat behaviour is improving or decaying.

The output is a repeat-purchase curve: for each acquisition cohort, what percentage of customers placed a second, third, or Nth order by month 1, 3, 6, 12, 18. It becomes the baseline every retention experiment is measured against.

Also known as
Shopify cohort export
24-month order backfill
repeat-purchase curve seed

The reason this workflow exists is that GA4 is a poor source of truth for repeat-purchase behaviour in Shopify stores. user_pseudo_id resets on cookie wipes, ITP, and cross-device sessions, so a customer who buys on mobile and re-orders on desktop looks like two different people.

Shopify's orders table has none of those problems. Every order is tied to a stable customer_id and a verified email. If you can get 24 months out, you can rebuild the cohort view in a spreadsheet.

Pulling the data: CSV export vs Admin API

For most stores under ~200k orders over 24 months, the CSV route is fastest. In Shopify Admin go to Orders, filter by date range (set start = today minus 730 days, end = today), tick "Export all matching orders", and pick the "Plain CSV file" format. You'll get an email with a download link, usually within 10-20 minutes.

Above ~200k orders, the CSV times out or arrives split across multiple files. Switch to the Admin API: GET /admin/api/2024-01/orders.json with status=any, financial_status=paid,partially_refunded, and created_at_min set to 24 months ago. Paginate with the Link header at 250 orders per page. A 500k-order store finishes in roughly 35-50 minutes on a single API key.

status=any is non-optional

If you forget status=any on the API call, Shopify only returns open orders by default — which means you'll miss every fulfilled order from more than ~60 days ago and your repeat-purchase curve will collapse to near-zero past month 2. This is the single most common cause of "why does my retention look terrible?" tickets.

Normalizing customer identity

Once the orders are out, the only field that matters for cohort assignment is the customer identifier. Shopify gives you both customer_id (numeric, stable) and email (string, normalize to lowercase, trim whitespace). Use customer_id as the primary key.

Guest checkouts won't have a customer_id in older exports. Fall back to lowercase email and treat any orders sharing the same email as the same customer. Roughly 8-15% of orders in a typical apparel or beauty store come through guest checkout, so skipping this step understates repeat rate by a meaningful margin.

Stitching this customer-level view back to GA4 sessions is a separate problem — see why GA4 user_pseudo_id breaks retention cohorts and how to stitch with Shopify customer_id for the joining pattern. For the repeat-purchase curve itself, you don't need GA4 at all.

Computing the curve

Benchmark

Reference repeat-purchase curves by Shopify vertical (% of customers placing a 2nd order by month N from first purchase)

VerticalMonth 1Month 3Month 6Month 12
Apparel & accessories8%16%24%32%
Beauty & skincare14%28%39%48%
Home & decor5%11%17%23%
Food & beverage (subscription)42%61%70%78%
Electronics & accessories4%9%14%19%

To produce your own version: for every customer, find min(order_date) — that's their cohort. For each subsequent order, compute (order_date - cohort_date) in months. Pivot: rows = cohort month, columns = months-since-first-order, values = distinct customers with ≥1 repeat order. Divide each row by the cohort's customer count to get the rate.

Common pitfalls to filter out before charting

Exclude test orders (Shopify Bogus Gateway, financial_status=test, or order tags like "test"), wholesale customers if you run a B2B channel through the same store, and orders fully refunded within 24 hours — they're typically address corrections, not real purchases. Each of these alone can inflate or distort cohort counts by 2-5%.

Also decide upfront whether subscription renewals count as "repeat purchases". For a coffee or supplement brand the honest answer is usually no — auto-renewals are a different behavioural signal than a customer actively choosing to buy again. Flag subscription_contract_id and report renewal-driven repeat separately.

What you do with the curve on day one

The curve feeds straight into a day-one retention audit with 18 months of backfilled data: you compare your month-3 and month-6 repeat rates against vertical benchmarks, identify the cohort where the curve flattens prematurely, and target that gap with the first wave of post-purchase experiments.

This is also the cleanest input for an LTV model. With 24 months of order-level data you can fit a simple BG/NBD or Pareto/NBD model in a few hours and stop guessing at customer lifetime value — far more reliable than the GA4 BigQuery cohort reconstruction route when event data is patchy.

Operational shortcuts

If you don't want to write SQL, the orders.csv loads cleanly into Google Sheets up to about 50k rows, or into a free DuckDB instance for anything larger. Pivot tables on "months-since-first-order" vs cohort month give you the curve in 15 minutes once the data is normalized.

Metricuno's Shopify plugin runs this export and curve computation automatically on install — you get a populated repeat-purchase curve before you've finished onboarding. But the manual route above is doable in an afternoon and worth knowing regardless, because it's how you'll spot-check any tool that claims to do it for you.

Frequently asked

Frequently asked questions

There's no hard cap — Shopify retains order history for the life of the store. Practical limits come from CSV export timeouts (around 200k orders) and API rate limits (2 requests/sec on standard plans, 4/sec on Plus). For 24 months on a store doing under 200k orders, the CSV route works fine.

No. The Admin API is available on every Shopify plan. You'll need to create a custom app in Settings → Apps and sales channels → Develop apps, grant it read_orders and read_customers scopes, and use the generated access token. Plus is only relevant if you need the higher rate limit.

GA4 identifies users with user_pseudo_id, which resets on cookie deletion, ITP enforcement, and across devices. A customer buying on mobile and re-ordering on desktop appears as two separate users, collapsing your apparent repeat rate. Shopify's customer_id is stable across all of that.

If they're logged in at checkout, Shopify's customer_id stays the same across email changes — no action needed. For guest customers, the email change creates a new identity in your data; you can't merge them without external information (a CRM tag, a support ticket). It affects roughly 1-2% of customers in most stores.

Include partial refunds (the customer kept something) and exclude full refunds where the entire order was returned within 24-48 hours — those are usually mistakes or address corrections. Refunds happening weeks later represent real purchases that didn't work out, and removing them distorts the cohort denominator.

Aim for at least 200 customers per monthly cohort. Below that, a few outlier repeat buyers swing the percentages by several points. Stores doing under ~50 first-time customers per month should bucket into quarterly cohorts instead.

Yes — headless setups still write to the same Shopify orders table, so the export and API endpoints behave identically. The only difference is that your front-end events may be in a separate system; the orders side is unchanged.

Schedule a nightly job pulling orders with updated_at_min = last_run_time via the Admin API, upsert into your local table, and recompute the curve weekly. Most teams set it up as a small Python script on a cron, or use a managed connector like Fivetran or Hightouch.

Shopify's built-in "Returning customer rate" report gives you a single aggregate number, not a cohort curve. To see how month-3 repeat rate has changed across acquisition cohorts, you have to export — the in-app reporting doesn't pivot that way.

They appear in your export only via orders inside the window, so their "first order" in your dataset isn't actually their first. Either filter them out (drop any customer whose Shopify-side first_order_date predates your window) or label them as a pre-existing cohort and analyse separately.

See Metricuno on your data

Bring your stack — Google Analytics, Stripe, a CRM, anything — and we'll walk through the metric tree that turns your funnel into one number.