Class 17 Case Study: Estimating Causal Network Effects for Platform Businesses Using Instrumental Variables

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

November 26, 2025

1 Causal Questions for Platform Businesses

1.1 Class objectives:

  • Understand the importance of causal inference for platform businesses.

  • Learn how to estimate causal effects using instrumental variables with an application to food delivery platforms.

1.2 Causal Questions for Platform Businesses

  • Platform businesses often need to answer critical causal questions to optimise their operations:

    • Measuring network effects: How does increasing supply (restaurants) affect demand (orders)?

    • Pricing: How does surge pricing affect consumer demand and driver supply?

  • When relying on secondary non-experimental data, these questions often face endogeneity challenges that require careful empirical strategies.

2 Case Study

Core case question: How to estimate the cross network effect (CNE) of restaurant density on total orders using historical data?1

  • We can run a linear regression model on UberEats’ historical data, where the dependent variable is the total number of orders; the key explanatory variable is the number of restaurants in a postcode.

\[ \text{TotalOrders} = \beta_0 + \beta_1 \text{NumberRestaurants} + \varepsilon \]

  • Is there endogeneity in this model?2
  1. Omitted variable bias: The number of restaurants is likely correlated with unobserved factors like local wealth or “foodie culture”. Wealthier areas attract both more restaurants and generate more orders, leading to upward bias.
  2. Reverse causality: Higher demand (orders) attracts more restaurants to open in the area. This feedback loop creates simultaneity bias.

2.1 Postcode Level Data

  • Aggregated transaction data from London postcodes (e.g., E1, W1).
    • total_orders: Outcome variable (Y)
    • n_restaurants: Endogenous explanatory variable (X)
    • avg_income: Control variable - Average income in the postcode
Code
pacman::p_load(dplyr, tidyr, broom, fixest, modelsummary)
data_london <- read.csv("data_london.csv")
Code
data_london %>%
    slice_head(n = 5)

3 Empirical Strategy

3.1 OLS Linear Regression

\[ TotalOrders_{i} = \beta_0 + \beta_1 NumberRestaurants_{i} + \beta_2 AvgIncome_i + \varepsilon_{i} \]

  • Exercise: Run the linear regression model controlling for income. Discuss the direction of the bias.

3.2 Instrumental Variables Regression

Can you come up with a valid instrument for the number of restaurants?

  • Instruments that satisfy (1) relevance, (2) exogeneity, and (3) exclusion restriction

3.3 Answer

  • Instrument (Z): Number of Hygiene Inspections

  • Logic:

    • Random inspections increase costs/risk for restaurants, reducing supply (Relevance).
    • Random inspections are conducted by local council and are beyond the control of restaurants (Exogeneity).
    • Inspections are bureaucratic processes unrelated to consumer hunger and thus should not directly affect total orders (Exclusion Restriction).

3.4 Two-Stage Least Squares (2SLS) Estimation: First Stage

  • First stage: Regress endogenous variable on the instrument and controls.

\[ NumberRestaurants_{i} = \pi_0 + \pi_1 HygieneInspections_{i} + \pi_2 AvgIncome_i + \varepsilon_{i} \]

  • Check for instrument relevance: Is \(\pi_1\) significantly different from zero? (Expect negative sign).

3.5 Two-Stage Least Squares (2SLS) Estimation: Second Stage

  • Second stage: Regress outcome on predicted restaurant count and controls.

\[ TotalOrders_{i} = \beta_0 + \beta_1 \widehat{NumberRestaurants}_{i} + \beta_2 AvgIncome_i + \varepsilon_{i} \]

  • The coefficient \(\beta_1\) is the causal effect of restaurant supply on orders.

  • Exercise: Run the 2SLS regression model. Compare the coefficient with OLS.

3.6 TSLS in One Step

  • We can run the 2SLS regression in one step using the feols function in the fixest package.
Code
model_iv_2nd <- feols(
    fml = total_orders ~ avg_income |
        n_restaurants ~ hygiene_inspections,
    data = data_london
)
  • The fml syntax is fml = Y ~ Control | X ~ Z, where Y is the outcome variable, Control is the control variable, X is the endogenous variable, and Z is the instrument.

3.7 The Cost of Getting It Wrong (Back-of-the-Envelope)

  • Suppose we want to know if we should spend £2,000 to onboard 10 new restaurants in a postcode. Margin per order is £5.
  • Biased OLS Estimate: \(\hat{\beta}_{OLS} = 44.2\) orders/restaurant.
    • Projected Gain: \(10 \times 44.2 \times £5 = £2,210\).
    • Profit: \(£2,210 - £2,000 = \mathbf{£210}\).
    • Decision: Invest.
  • True Causal Effect (IV): \(\hat{\beta}_{IV} = 13.2\) orders/restaurant.
    • Projected Gain: \(10 \times 13.2 \times £5 = £660\).
    • Profit: \(£660 - £2,000 = \mathbf{£-1,340}\).
    • Decision: Do Not Invest.
  • Consequence: Relying on OLS leads to a bad investment decision. We would invest in a campaign that actually loses money (or miss a profitable one, depending on bias direction).

4 Instrumental Variable in A/B Testing (Optional)

4.1 A/B Testing with an Encouragement Design

  • In experimental settings (A/B testing), IVs are used to handle imperfect compliance. This is often called an Encouragement Design.

  • Intended Treatment (Z): Random assignment to a treatment through A/B testing (e.g., a subsidy offer to restaurants).

  • Outcome (Y): Business metric (e.g., orders).

\[ Y = \beta_0 + \beta_1 Z + \varepsilon \]

  • \(\beta_1\) is the intent-to-treat effect of the treatment on the outcome.

4.2 LATE: Local Average Treatment Effect

  • Actual Treatment (X): Actual uptake of the treatment (e.g., joining the platform). However, the following regression is endogenous because those who receive the treatment and join the platform are fundamentally different from those who receive the treatment but do not join the platform.

\[ Y = \gamma_0 + \gamma_1 X + \varepsilon \]

  • Since we cannot force participation, we use the random assignment (Z) as an instrument for actual participation (X).

  • The IV estimator recovers the Local Average Treatment Effect (LATE): the effect on those induced to participate by the encouragement.

4.3 Example 1: Sign Up Bonus for New Customers

  • Context: UberEats offers a random £10 sign-up bonus to a subset of potential customers to encourage them to download the app.
  • Causal Question: What is the effect of App Download (X) on Customer Lifetime Value (Y)?
  • Endogeneity: Customers who download the app organically are likely more interested in food delivery (higher CLV) than those who don’t.
  • Instrumental Variable:
    • Instrument (Z): Random Assignment to receive the bonus offer (Intent-to-Treat).
    • Treatment (X): Did the customer download the app?
    • Outcome (Y): CLV (orders placed in the next 6 months).
  • IV Logic: The bonus is randomly assigned (exogenous). It increases the probability of downloading the app (relevance). It affects CLV only through the app usage (exclusion).
  • LATE: The effect of the app on CLV for those customers who only downloaded it because of the £10 bonus (the marginal customers).

4.4 Example 2: Apple’s App Tracking Transparency (After-Class Reading)

  • Context: Apple introduced ATT, requiring apps to ask users for permission to track them across other apps and websites.
  • Causal Question: What is the effect of Tracking Permission (X) on Ad Effectiveness (Y)?
  • Endogeneity: Users who opt-in to tracking are fundamentally different from those who opt-out (selection bias). We cannot simply compare opt-in vs. opt-out users.
  • Instrumental Variable Strategy:
    • Instrument (Z): The Policy Change itself (Pre-ATT vs. Post-ATT, or random rollout). The policy “encourages” (or forces) users to make a choice, changing the probability of being tracked.
  • IV Logic: The policy change is exogenous (determined by Apple). It strongly affects tracking rates (Relevance). It affects ad effectiveness primarily through the loss of tracking data (Exclusion).
  • LATE: The IV estimate gives us the effect of tracking on ad effectiveness for the “compliers” (users whose tracking status changed due to the policy).

4.5 After-Class Reading

Footnotes

  1. Direct network effect (DNE) is defined as the network effect that arises from the number of users on the same side of the platform. For instance, more users on UberEats will lead to more new users joining the platform. ↩︎

  2. Answers are available on the HTML version.↩︎