Analytics8 min read

Geo Holdout Testing: How to Measure True Marketing Incrementality

Attribution models tell you where credit went. Geo holdout tests tell you whether the spend actually caused the outcome. How they work, how to design one, and what the results typically show - including the branded search finding that surprises most teams.

Precision measurement instrument representing controlled testing and incrementality analysis

Geo Holdout Testing: How to Measure True Marketing Incrementality

A geo holdout test is a controlled marketing experiment where you divide your market into geographic regions, run your normal advertising in the treatment regions, withhold it entirely in the control (holdout) regions, and measure the difference in outcomes between the two groups.

The result is incremental ROAS (iROAS) - the return on spend that accounts only for outcomes that would not have happened without the advertising. This is distinct from platform-reported ROAS, which counts all conversions that occurred alongside the ad but cannot separate causation from correlation.

Key takeaways

Geo holdout testing is the closest approximation to a randomized controlled trial available in marketing measurement, making it the most trusted method for quantifying true ad incrementality.
Across 225 geo-based tests on one incrementality platform between August 2024 and December 2025, the median iROAS was 2.31x - often significantly different from the platform-reported ROAS for the same campaigns.
Branded Google Search campaigns frequently test at iROAS below 1.0x - meaning the majority of the conversions attributed to the campaign would have occurred anyway through organic search.
A geo holdout test requires at least 10-15 matching geographic pairs (test and control) and a minimum 4-week test duration to reach statistical significance at 80% power for most B2B campaigns.
The most common design error is under-matching: selecting test and control regions based on geographic proximity rather than historical conversion similarity, which confounds the result.

Why attribution models are insufficient

Attribution models - whether first-touch, last-touch, data-driven, or linear - assign credit to touchpoints in the conversion path. They answer "which ad was present before the conversion?" They cannot answer "would this conversion have happened without that ad?"

The distinction matters because most of the traffic captured by your retargeting campaigns, brand search campaigns, and late-funnel email sequences consists of people who were already going to convert. The ad was present, but it was not the cause. Platform-reported ROAS is calculated on all conversions attributed to a channel, including those organic conversions that happened to occur after an ad impression or click.

For marketing mix modeling, aggregate statistical models estimate channel contribution across long time horizons but cannot isolate the effect of a single campaign or channel at a moment in time. Geo holdout testing fills the gap: it gives you a controlled read on a specific channel's true incrementality, typically within a 4-8 week test window.

How geo holdout tests work

Step 1: Define your test objective. What specific question are you trying to answer? Common objectives: Is our branded search spend driving incremental conversions or capturing organic demand? Does our prospecting display budget produce incremental pipeline, or does pipeline close anyway? Is our Meta retargeting adding lift, or are those buyers already in the purchase decision?

Step 2: Select and match geographic pairs. Identify at least 10-15 geographic markets (DMAs, cities, states, or countries depending on your scale). Match them into test-control pairs based on historical conversion volume, demographic similarity, and seasonal patterns - NOT geographic proximity. Texas and Nevada are a better matched pair than Texas and Oklahoma if they share similar conversion histories.

Step 3: Assign treatment and holdout. Randomly assign one region in each pair to treatment (receives ads as normal) and one to holdout (ads are paused entirely or budgeted to zero). Randomization is critical - the assignment must not be based on which regions are "already performing well".

Step 4: Run the test for the required duration. For B2B SaaS with longer consideration cycles, 4-8 weeks is typically the minimum. For ecommerce with immediate conversion feedback, 2-4 weeks may be sufficient. The test must run long enough to capture a full conversion cycle.

Step 5: Measure the difference. Incremental conversions = conversions in test regions / conversions in holdout regions x holdout conversions - then calculate iROAS = incremental revenue / ad spend in test regions.

Prooflytics

Turn scattered analytics into one clear picture

Every source in one brief. The whole picture. Your decision.

Start free trial See pricing

14 days free · no credit card

The branded search finding that changes budgets

The ICP problem this creates for marketing teams: branded search campaigns are typically the highest-ROAS line in any paid search account. A $5-10 CPL and a 10-20x reported ROAS make them an easy budget justification. However, the reported ROAS is capturing an audience that was already going to convert - they typed your brand name into Google because they already knew you.

Geo holdout tests on branded search campaigns consistently show iROAS in the range of 0.50-1.20x. The Stella platform data (225 tests, 2024-2025) found branded Google Search at a median of 0.70x iROAS - meaning approximately 70 cents of incremental value was generated per dollar spent, not the 10-20x that platform attribution shows.

This finding has a direct budget implication: branded search budgets are often 20-40% of total paid search spend. An iROAS below 1.0x means that budget is net-negative in incremental terms. Some teams choose to maintain branded spend for defensive purposes (preventing competitors from capturing their branded terms), but they should do so knowing the incremental cost, not justifying it on the attributed ROAS.

This connects to the share of voice measurement problem: branded impression share from organic search is the signal you are actually defending. The paid branded spend is protection, not demand generation.

What types of campaigns show positive incremental ROAS

Based on aggregate geo holdout data across B2B and DTC categories:

High incrementality (iROAS typically 1.5-3x): Prospecting display and video campaigns targeting net-new audiences. Upper-funnel LinkedIn Sponsored Content reaching job titles not yet in your CRM. Non-branded paid search capturing category-level demand (competitors' branded terms, problem-statement queries).

Moderate incrementality (iROAS typically 0.8-1.5x): Mid-funnel retargeting reaching visitors who showed high engagement but have not converted. Email re-engagement to cold database contacts. Demand gen campaigns with both awareness and performance objectives.

Low or negative incrementality (iROAS typically below 1.0x): Branded paid search (as discussed above). Retargeting campaigns reaching visitors who are already in active sales conversations. Nurture emails sent to prospects the sales team is already actively working.

Practical considerations for B2B teams

Sample size requirements are higher in B2B. If your monthly conversion volume is below 50 per region, you will struggle to achieve statistical significance in a 4-8 week test window. B2B SaaS companies with long sales cycles and low monthly deal counts may need to use pipeline creation as the primary metric rather than closed-won revenue, which is an acceptable proxy.

Test one channel at a time. Running a geo holdout on multiple channels simultaneously makes the result uninterpretable. If you pause both branded search and retargeting in the holdout regions, you cannot separate their contributions.

Holdout regions should be genuinely isolated. If your sales team continues outreach in holdout regions, or if organic and direct traffic differs significantly between regions, the test is contaminated. Digital-only tests work better in B2B than tests that try to separate digital from sales-touch.

Prooflytics surfaces regional performance data from your ad platforms and GA4 in the daily briefing, giving you the geographic performance split that is the foundation for identifying candidate markets for a geo test and monitoring outcomes during the test window.

When geo holdout testing is NOT the right tool

Geo holdout tests require sufficient conversion volume and geographic independence to produce reliable results. They are not appropriate if:

Your conversion volume is concentrated in one or two cities or regions (you cannot isolate them as test and control without cutting off most of your market)
Your sales cycle is longer than 6 months (the test window would need to be so long that market conditions shift during the test)
You already use marketing mix modeling and have enough historical data for MMM to provide accurate channel contribution estimates

For companies below $5M ARR or with fewer than 20 inbound conversions per month across all channels, the demand gen metrics framework provides more actionable guidance than running incrementality tests at insufficient sample sizes.

Bottom line

Geo holdout tests answer "did this spend cause this outcome" - attribution models only answer "was this ad present before the outcome"
Median iROAS across geo holdout tests in 2024-2025 is 2.31x - typically lower than platform-reported ROAS for the same campaigns because organic demand is excluded
Branded paid search frequently tests at iROAS below 1.0x - many teams are paying to capture conversions that would have happened organically
Valid geo holdout tests require 10-15 matched region pairs, a minimum 4-week run time, and sufficient conversion volume in each region
You can read independent reviews of Prooflytics on G2 and see how teams use it to monitor geographic performance splits as the foundation for incrementality test design

Frequently asked questions

How is geo holdout testing different from a conversion lift study?+

A conversion lift study (available natively in Meta, Google, and LinkedIn) runs within a single platform and measures lift for that platform's campaigns. Geo holdout testing is platform-agnostic - you can test any channel by pausing it in specific geographies, regardless of whether that channel has a native lift study tool. Geo testing also removes self-reporting bias: the platform's own lift study has an incentive to show positive results.

How many geographic regions do I need for a valid test?+

The minimum is typically 10-15 test-control pairs. Fewer pairs increases the risk of a chance imbalance between the groups producing a misleading result. If you have fewer than 10 available markets with sufficient conversion volume, the test is under-powered and the results should be treated as directional, not definitive.

Does geo holdout testing work for B2B SaaS with a global customer base?+

It works but requires more careful design. For globally distributed B2B SaaS, country-level geographies are typically the right unit (test France, hold out Spain, etc.) rather than sub-national DMAs. Ensure the holdout countries are not receiving your ads through global campaigns running on continuous delivery - all campaigns must be truly paused in holdout regions.

What should I do with a below-1.0x branded search iROAS finding?+

First, verify the result by checking that organic and direct traffic performed similarly in test and holdout regions (ruling out confounding). If confirmed, the options are: (a) reduce branded search spend significantly and reallocate to upper-funnel channels with measured positive incrementality, (b) maintain a minimal branded spend for defensive purposes against competitors bidding on your terms, or (c) test whether competitor conquest campaigns on your brand terms are actually driving significant conversion loss if you pause branded defense.

How long does a geo holdout test take from design to results?+

Typically 8-14 weeks total: 2-3 weeks for region selection and matching, 4-8 weeks of test runtime, 1-2 weeks for analysis and confidence interval calculation. For B2B SaaS, plan the test at least 3 months before you need the results to inform a budget decision.

Prooflytics

Turn scattered analytics into one clear picture

Every source in one brief. The whole picture. Your decision.

Start free trial See pricing

14 days free · no credit card

Continue reading

Attribution· 9 min read

What Is Incrementality Testing in Marketing? A Practical Guide for In-House Teams

Attribution tells you who got credit. Incrementality testing tells you what actually caused the conversion. Here's how in-house marketing teams measure true ad impact without a data science department.

Guide· 7 min read

What Is Marketing Attribution? Models, Limitations, and How to Choose the Right One

Marketing attribution assigns credit for conversions to the touchpoints that contributed to them - but every model makes different assumptions. Here is the full breakdown of six models, when to use each, and why the sum of platform ROAS always exceeds actual revenue.

Platform· 7 min read

Google Ads Bid Strategy Testing in 2026 Requires CRM Data, Not Just Campaign Metrics

Google Ads has shifted bid strategy validation toward conversion value by time and first-party CRM data. Testing on campaign metrics alone now produces false positives. Here is the operational setup teams need before running any bid experiment in 2026.

Platform· 10 min read

Google Ads Bid Strategy Testing Now Requires CRM Data: What Changed in 2026

Google Ads changed bid strategy validation requirements in 2026, shifting from surface-level metrics like ROAS and CPC toward conversion value by time window and first-party CRM data integration. Testing a new bid strategy without CRM data now risks false positives and budget waste -- the model validates against aggregated campaign metrics that can be disconnected from actual customer lifetime value.