Importing GA4 Customer History To Backfill Retention Cohorts

Metricuno
May 28, 2026
6 min read
Importing GA4 Customer History To Backfill Retention Cohorts — Import 12-24 months of GA4 + Shopify order history to seed retention cohorts on day one. Step-by-step backfill for DTC stores switching analytics tools.
Quick answer

How to pull 12-24 months of customer history out of GA4, Shopify, and Klaviyo to seed retention cohorts on day one — instead of waiting a year for a new analytics tool to accumulate data.

Quick answer

You don't need GA4 alone. Export 24 months of Shopify (or Woo/Magento) orders as the spine, join them to GA4 BigQuery sessions on email or user_id to recover acquisition channel, and use Klaviyo profile properties to patch the gaps. Done right, your new analytics tool opens on day one with a fully populated retention curve instead of a blank slate.

Definition
Analytics implementation

Importing GA4 Customer History To Backfill Retention Cohorts

Reconstructing 12-24 months of customer-order history from GA4, Shopify, and Klaviyo so retention cohorts are populated from day one.

Backfilling retention cohorts is the practice of importing historical order and session data from your existing stack — GA4, Shopify, Klaviyo, WooCommerce, Magento — into a new analytics tool, so the retention curve, repeat-purchase rate, and channel-attributed LTV are visible immediately rather than 6-12 months later.

For an online store switching tools, this is the difference between a cold-start dashboard that's useful next year and one that's useful this week. The technical core is joining a transactional spine (orders) to a behavioural spine (sessions) on a stable identity key, then bucketing customers by first-order month to form cohorts.

Also known as
cohort backfill
retention history import
day-one cohort seeding

Most teams discover the problem on week two of a new tool: the retention chart shows one bar, because only customers who placed an order since the snippet went live are in the system. Everyone who bought in the previous 18 months is invisible.

Why cold-start retention dashboards mislead

A retention cohort needs at least two purchases per customer to mean anything — the first to define the cohort, the second (or sixth) to measure repeat behaviour. Without backfill, your day-30 repeat rate is artificially zero for every cohort younger than 30 days.

Worse, the cohorts that DO populate are biased toward your most loyal segment — repeat buyers who happened to come back during the observation window. Acquisition decisions made on that curve overweight VIPs and underweight the one-and-done majority.

The 14-month lookback trap

GA4's standard reports cap user-level lookback at 14 months on the free tier. If you need 18 or 24 months of cohort history, the standard UI alone won't get you there — you need the BigQuery export, or a transactional spine from Shopify, or both. Plan for this before you start the migration, not after.

What to export, and from where

Treat Shopify (or WooCommerce/Magento) as the source of truth for orders. Export at minimum: order_id, customer_id, email, order_date, line items, gross revenue, discount, and the landing referrer if you have it. Twenty-four months is the practical ceiling for most beauty and apparel brands — long enough to see a full annual cycle plus replenishment behaviour.

GA4 contributes acquisition channel, device, and on-site behaviour. The richest path is the BigQuery export — full event-level data, no sampling, full lookback. The standard reports are the fallback when BigQuery isn't wired up, with the trade-offs covered in the BigQuery-vs-standard-reports comparison. Klaviyo fills the identity gap: profile properties like first_purchase_date and total_orders are durable across the GA4 cookie reset.

How to actually stitch it together

Step one is identity. GA4's user_pseudo_id is a cookie, not a person — it resets on browser clear, device switch, and Safari's 7-day ITP cap. You join on a stable key: email hash where available, Shopify customer_id where the user logged in. The stitching pattern is covered in detail in the user_pseudo_id-to-customer_id guide.

Step two is guest checkout. Roughly 40-60% of Shopify orders in apparel and beauty are guest checkouts, which means no customer_id. You resolve them by email-hash matching across orders, and accept that pre-iOS14 channel attribution will be approximate — the referrer-rebuild approach is the closest you'll get.

Order of operations that works

1. Pull 24 months of Shopify orders as the spine. 2. Pull GA4 BigQuery sessions for the same window. 3. Join on email-hash, fall back to customer_id, fall back to user_pseudo_id within a 7-day session. 4. Patch missing acquisition channel with Klaviyo profile.acquisition_source. 5. Bucket customers into monthly cohorts by first_order_date. 6. Compute repeat rate at day 30, 60, 90, 180.

What a backfilled day-one audit reveals

With 18 months of data populated, the first audit usually surfaces three things: a channel whose 90-day repeat rate is half the site average (typically a discount-driven paid social cohort), a hero SKU whose buyers replenish at 60-75 days, and a guest-checkout bucket whose LTV is mis-attributed. The day-one retention audit walkthrough shows what these look like in practice.

From there you can plug numbers into the retention rate calculator to see how a 2-point lift on the worst-performing cohort moves blended LTV — usually enough to justify a quarter of CRO work targeted at the post-purchase flow rather than the top of funnel.

Frequently asked

Frequently asked questions

Twenty-four months is the practical ceiling for most Shopify stores — that's the standard order-export window and long enough to capture full seasonality plus a second replenishment cycle. GA4's free tier caps user-level reports at 14 months, but BigQuery export goes back to whenever you enabled it. If you didn't enable BigQuery early, your GA4 ceiling is 14 months and Shopify becomes the longer spine.

Standard reports work for cohorts under 14 months and headline channel splits. BigQuery is necessary when you need event-level joins to Shopify orders, sub-channel attribution, or windows longer than 14 months. For most €1M-€15M stores doing a one-time backfill, BigQuery is worth the 2-3 hours of setup.

You stitch them on hashed email instead. Roughly half of Shopify orders in beauty and apparel are guest, so this matters. Where the same email appears across multiple guest orders, you treat them as one customer; where it doesn't, you accept that the cohort starts at order 1 with no prior history.

Only as a last resort, and only within a single session window. The pseudo_id resets on cookie clear and Safari ITP, so it under-counts repeat behaviour by 20-40% on iOS traffic. Use email-hash or Shopify customer_id as primary keys and user_pseudo_id only to bridge same-session pre-login events.

The pattern is the same but the export is messier — Woo and Magento don't expose customer_id as cleanly, and order exports often lack the landing-referrer field Shopify includes. The Woo/Magento vs Shopify comparison covers the specific fields you'll need to reconstruct from server logs or order-meta tables.

It can, if Klaviyo has been running with order events for the full lookback window. Profile properties like first_purchase_date, total_orders, and historic_revenue survive GA4 cookie resets and tie cleanly to email. It's the right fallback when GA4 lookback is capped or BigQuery isn't available.

Pre-April 2021 cohorts have richer channel data than post — paid social attribution collapsed when ATT shipped. You either reconstruct pre-iOS14 channels from Shopify referrer headers (imperfect but better than nothing) or you accept a discontinuity in the cohort chart and label it explicitly. Don't blend the two as if nothing changed.

For a Shopify store with under 500k orders, the export and join is a half-day of work. The slow part is identity resolution on guest checkouts and validating that the cohort curve matches your existing reporting. Budget 2-3 days end-to-end if you want clean output you'd show a board.

No, and that's expected. GA4 deduplicates users by pseudo_id, Shopify by customer_id, Klaviyo by email — the three rarely agree on totals. Aim for within 5% on revenue and within 10% on user counts; bigger gaps usually mean a join key is broken.

The 18-24 month historical pull is one-time. After that, the new analytics tool ingests live order and session events directly, so the backfill is the bridge between cold-start and steady-state. Plan for it as a migration task, not a recurring pipeline.

See Metricuno on your data

Bring your stack — Google Analytics, Stripe, a CRM, anything — and we'll walk through the metric tree that turns your funnel into one number.