Estimating Causal Effects of Restaurant Supply on UberEats Orders

MSIN0094 Case Study

Author
Affiliation

Dr. Wei Miao

UCL School of Management

Published

November 26, 2025

1 Industry Background

The food delivery industry has witnessed explosive growth, transforming how urban consumers dine. Platforms like UberEats serve as two-sided marketplaces connecting hungry customers with a variety of local restaurants. A critical driver of platform growth is the cross network effect: a wider selection of restaurants attracts more customers, which in turn generates more orders.

For UberEats, understanding the precise causal effect of restaurant density (supply) on total orders (demand) is vital for strategic decision-making. If increasing the number of restaurants significantly boosts total orders (rather than just cannibalising existing ones), the platform should invest heavily in onboarding new merchants.

In this case study, we examine the causal link between the number of restaurants and total orders in London postcodes. We will tackle the endogeneity challenges inherent in this economic relationship using Instrumental Variables (IV).

2 Data Description

We focus on a cross-sectional dataset covering London postcodes (e.g., E1, W1, WC1) in Week 1 of 2024. The data science team has aggregated transaction logs into a postcode-level dataset.

The dataset data_london contains the following key variables:

  • postcode_area: The London postcode district (e.g., “E1”).
  • n_restaurants: Number of active restaurants on the platform.
  • total_orders: Aggregated number of orders in the week.
  • hygiene_inspections: Number of random hygiene inspections conducted by local council authorities.
  • avg_income: Average annual income in the postcode (in thousands GBP).
  • year: Year of observation (2024).
  • week: Week of observation (Week 1).
postcode_id postcode_area year week n_restaurants hygiene_inspections avg_income total_orders
1 N13 2024 1 52 5 25.43575 1714
2 WC12 2024 1 42 16 65.71556 2424
3 WC16 2024 1 21 13 36.47918 761
4 W7 2024 1 45 11 40.59615 1892
5 NW2 2024 1 21 19 30.44267 1228
6 NW1 2024 1 37 11 33.54817 1408

3 Empirical Analysis

We wish to estimate the regression: \[ TotalOrders_i = \beta_0 + \beta_1 Restaurants_i + \gamma Control_i + \varepsilon_i \]

3.1 Endogeneity and OLS

  1. Run a simple OLS regression of total_orders on n_restaurants, controlling for avg_income.
  2. Discuss the endogeneity problem. Specifically, explain how Omitted Variable Bias (OVB), Reverse Causality, and Measurement Error might bias your estimate of \(\beta_1\).
Code
# OLS Regression
model_ols <- feols(
    total_orders ~ n_restaurants + avg_income,
    data = data_london
)
modelsummary(list("OLS" = model_ols), stars = TRUE, gof_map = "nobs")
OLS
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) -662.230***
(101.084)
n_restaurants 44.166***
(1.890)
avg_income 14.946***
(2.332)
Num.Obs. 200

Endogeneity Discussion:

  1. Omitted Variable Bias (OVB): Even after controlling for income, factors like “foodie culture” are unobserved (\(\varepsilon_i\)). Culturally vibrant areas likely have both more restaurants (supply) and higher demand (orders). Since these omitted variables are positively correlated with both the regressor and the outcome, the OLS estimate is likely upward biased.
  2. Reverse Causality: While more restaurants drive orders, high demand also attracts new restaurants to open. This feedback loop creates simultaneity bias, further inflating the OLS estimate.
  3. Measurement Error: The recorded n_restaurants might contain errors (e.g., ghost kitchens not listed correctly, inactive accounts). Classical measurement error in the independent variable typically attenuates the coefficient towards zero (attenuation bias), potentially opposing the OVB/simultaneity bias.

3.2 Instrumental Variables (IV)

To solve the endogeneity problem, we propose using Hygiene Inspections (hygiene_inspections) as an instrument for n_restaurants. These are random audits conducted by local authorities to ensure food safety compliance. In our setting, we treat these inspections as randomly assigned or policy-driven shocks that are unrelated to the current consumer demand (unconfoundedness).

  1. Argue whether hygiene_inspections satisfies the Relevance and Exclusion conditions.
  2. Perform a Two-Stage Least Squares (2SLS) regression, controlling for avg_income. Report the first stage and second stage results. Interpret the causal effect.

IV Validity:

  • Relevance: Frequent hygiene inspections impose administrative burdens and operational costs on restaurants, potentially leading to closures or discouraging new openings. We expect a negative correlation between inspections and the number of restaurants.
  • Exclusion: The frequency of government health inspections is a bureaucratic process typically determined by council resources and random scheduling. It should not directly affect consumer demand for UberEats orders, except through the availability of restaurants (the supply channel).
Code
# First Stage
model_iv_1st <- feols(
    n_restaurants ~ hygiene_inspections + avg_income,
    data = data_london
)

data_london <- data_london %>%
    mutate(
        predicted_n_restaurants = predict(model_iv_1st, data_london)
    )

# Second Stage
model_iv_2nd <- feols(
    total_orders ~ predicted_n_restaurants + avg_income,
    data = data_london
)

# If we use feols() to get the two stages in one step:
model_iv_1step <- feols(
    total_orders ~ avg_income | n_restaurants ~ hygiene_inspections,
    data = data_london
)

modelsummary(
    list(
        "1st Stage" = model_iv_1st,
        "2SLS (IV)" = model_iv_2nd,
        "2SLS (1-step)" = model_iv_1step
    ),
    stars = TRUE,
    gof_map = c("nobs", "r.squared", "F")
)
1st Stage 2SLS (IV) 2SLS (1-step)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 37.653*** -187.148 -187.148
(3.503) (206.247) (165.310)
hygiene_inspections -1.870***
(0.169)
avg_income 0.505*** 30.621*** 30.621***
(0.063) (5.040) (4.040)
predicted_n_restaurants 13.206*
(5.850)
fit_n_restaurants 13.206**
(4.689)
Num.Obs. 200 200 200
R2 0.488 0.311 0.311

The 2SLS estimate for n_restaurants corrects for the endogeneity bias. The first stage shows the negative impact of excessive inspections on restaurant supply. The second stage uses this exogenous variation to identify the causal effect of restaurant supply on total orders.

The third column confirms the 2SLS estimate using a one-step approach, yielding consistent results. The coefficients are the same across both 2SLS implementations. However, the standard errors may differ slightly due to different calculation methods.

4 Managerial Implications

Using a back-of-the-envelope calculation, illustrate the financial consequence of using the biased OLS estimate versus the causal IV estimate.

Assume the following:

  • UberEats is considering a marketing campaign to onboard 10 new restaurants in a specific postcode.
  • The cost of this campaign is £2,000.
  • UberEats makes a margin of £5 per order.
  • Use the coefficients from your OLS and 2SLS regressions above.

Should UberEats proceed with the campaign?

Code
# Extract coefficients
coef_ols <- coef(model_ols)["n_restaurants"]
coef_iv <- coef(model_iv_2nd)["predicted_n_restaurants"]

# Campaign Parameters
n_new_restaurants <- 10
margin_per_order <- 5
cost_campaign <- 2000

# ROI Calculation - OLS
revenue_ols <- n_new_restaurants * coef_ols * margin_per_order
profit_ols <- revenue_ols - cost_campaign

# ROI Calculation - IV
revenue_iv <- n_new_restaurants * coef_iv * margin_per_order
profit_iv <- revenue_iv - cost_campaign

Scenario Analysis:

  1. Decision based on Biased OLS:
    • Estimated Gain: \(10 \times 44.2 \text{ orders} \times £5 = £2,208\)
    • Profit/Loss: \(£2,208 - £2,000 = \mathbf{£208}\)
    • Decision: Because the OLS estimate (44.2) is inflated by endogeneity (e.g., culturally vibrant areas having both more restaurants and more orders), it overestimates the return. We might mistakenly launch a campaign that appears profitable but isn’t.
  2. Decision based on Causal IV:
    • Estimated Gain: \(10 \times 13.2 \text{ orders} \times £5 = £660\)
    • Profit/Loss: \(£660 - £2,000 = \mathbf{£-1,340}\)
    • Decision: The true causal effect (13.2) is much lower. The campaign actually leads to a loss.

Conclusion: Relying on the biased OLS estimate would lead to a bad investment decision, costing the company money. Causal inference saves us from launching unprofitable initiatives.

5 Intent-to-Treat (ITT) and Encouragement Design

In some contexts, instrumental variables are used to analyse randomised experiments where compliance is imperfect. This is often called an Encouragement Design.

For example, suppose UberEats ran an experiment where they randomly offered subsidies to potential restaurant partners in certain postcodes (Treatment) but not others (Control).

  • Instrument (Z): The random assignment to the subsidy offer (Intent-to-Treat).
  • Endogenous Treatment (X): Whether the restaurant actually joined the platform.
  • Outcome (Y): Total orders.

Even if we cannot force restaurants to join, the random assignment (\(Z\)) is a perfect instrument for participation (\(X\)) because it is random (exogenous) and strongly influences participation (relevant). The IV estimate then recovers the Local Average Treatment Effect (LATE): the effect of joining the platform for those restaurants who were induced to join because of the subsidy.

In our current case, we can view the random hygiene inspections similarly: they are an exogenous “discouragement” (negative instrument) randomly assigned by the government, which shifts the supply curve of restaurants, allowing us to trace out the demand curve.