Class 16 Instrumental Variables and Two-Stage Least Squares

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

November 20, 2024

1 Instrumental Variable

1.1 Class Objectives

  • The requirements of a valid instrumental variable and how to find good instruments

  • Intuition of why instrumental variables solve endogeneity problems

  • Apply two-stage least square method to estimate the causal effects using instrumental variables

1.2 Causal Inference from OLS

  • From non-experimental secondary data, it is impossible to control all confounding factors, which means we can never obtain causal effects from OLS regressions.

  • Can we still obtain causal inference from secondary data?

1.3 What is an Instrumental Variable

1.3.1 Instrumental Variable

An instrumental variable is a set of variables \(Z\) that satisfies the following requirements:

  1. \(z\) is exogeneous and uncorrelated with \(\epsilon\); that is, \(cov(Z,\epsilon) = 0\)

  2. \(z\) only affects \(Y\) through \(X\), but not directly affect \(Y\)

  3. \(z\) affects \(x\) to some extent, that is, \(cov(Z,x) \neq 0\)

  • Point 1 is called exogeneity requirement: the instrumental variable should be beyond individual’s control, such that the instrumental variables are uncorrelated with any individual’s unobserved confounding factors.

    • Potential IVs: government policy; natural disasters; randomized experiment; birthdays; etc.
  • Point 2 is called exclusion restriction: the instrumental variable should only affect \(Y\) through \(X\), but not directly affect \(Y\).

  • Point 3 is called relevance requirement: though beyond an individual’s control, the instrumental variable should still affect the individual’s \(X\), causing some exogenous changes in \(X\) that is beyond individual control.

    • If the correlation between \(z\) and \(x\) is too small, we have a weak IV problem.

1.4 Graphical Illustration of IV

1.5 A Classic Example of Instrumental Variable

Return of Military Service to Lifetime Income1

\[ Income = \beta_0 + \beta_1MilitaryService + \epsilon \]

  • OLS suffers from endogeneity problems. What are the potential endogeneity issues?

  • A lottery was used to determine if soldiers with certain birthdays are drafted to the frontline.

1.6 A Classic Example of Instrumental Variable

  • The date of birth (\(z\)) or zodiacs can be an instrumental variable for military service (\(x\)) in this case.
    • Relevance requirement: Affects years of military service: \(cov(z,x) \neq 0\)
    • Exogeneity requirement: Randomly drawn and thus uncorrelated with any confounders: \(cov(z,\epsilon) = 0\)
    • Exclusion restriction: \(z\) only affects \(Y\) through \(X\), but not directly affect \(Y\).

1.7 More Examples of IVs

Can you come up with IV candidates for the following causation questions?

  • Number of restaurants on UberEat => Number of orders on UberEat
    • temporary close down of restaurants due to government inspections
  • Retail price => Sales
    • wholesale price
    • costs of raw materials
    • COGS
    • Hausman instruments: the prices of the same product in other markets

2 Two-Stage Least Squares

2.1 Solving Endogeneity Using IV

  • Given an endogenous OLS regression,

\[ y_{i}=X_{i} \beta+\varepsilon_{i}, \quad \operatorname{cov}\left(X_{i}, \varepsilon_{i}\right) \neq 0 \]

  • Find instrumental variables \(Z_i\) that do not (directly) influence \(y_i\) , but are correlated with \(X_i\)

2.2 Two-Stage Least Squares: Stage 1

  1. Run a regression with X ~ Z. The predicted \(\hat X\) is predicted by Z, which should be uncorrelated with the error term \(\epsilon\).
    • \(\hat{X}\) (the part of changes in \(X\) due to \(Z\)) is exogenous, because \(Z\) is exogenous
    • All endogenous parts are now left over in the error term in the first-stage regression \(\epsilon_{i}\)

\[ X_{i}=Z_{i}\eta+\epsilon_{i} \]

2.3 Two-Stage Least Squares: Stage 2

  1. Run a regression with \(Y\) ~ \(\hat{X}\): now \(\hat{X}\) is uncorrelated with the error term and thus we can get causal inference from the second stage regression.

\[ y_{i}=\hat{X} \beta+\varepsilon_{i}, \quad \operatorname{cov}\left(\hat{X}_{i}, \varepsilon_{i}\right) = 0 \]

2.4 After-Class Readings

Footnotes

  1. Angrist, Joshua D., Stacey H. Chen, and Jae Song. “Long-term consequences of Vietnam-era conscription: New estimates using social security data.” American Economic Review 101, no. 3 (2011): 334-38.↩︎