Class 11 Causal Inference & Potential Outcome Framework

Author

Affiliation

Dr Wei Miao

UCL School of Management

Published

November 6, 2024

1 Causal Inference

1.1 Our Journey So Far

Any business activity brings benefits and costs. We’re given the benefit information in all the case studies so far
- Apple (Week 1): influencer marketing increases sales by 2.5%
- 1st assignment: loyalty program increases retention rate to 95%
In reality, this benefit information is often not readily available, and we need to measure them using causal inference tools.

1.2 Causal Inference Road Map

1.3 Learning Objectives

Understand key concepts of causal inference
- Rubin’s potential outcome framework
- Fundamental problem of causal inference
- Average treatment effect (ATE) and randomization
Learn the steps to conduct randomized controlled trials (RCTs)

1.4 Why Causal Inference Matters? Example 1

Tom purchases paid ads on Instagram to advertise his new bubble tea shop. Instagram ads are targeted to individuals who are predicted to have a higher likelihood of being bubble tea lovers. In the end, some Instagram users saw no ads and some saw the ads. The purchase rates for each group are shown below.

Question: Can Tom be confident to conclude the Instagram ads are effective in converting new customers?

1.5 Why Causal Inference Matters? Example 2

Tom bought a marketing survey data from a consulting agency. The survey collected prices and store visits (sales) for different bubble tea shops in Canary Wharf. Tom finds that there seems to be a positive relationship between prices and store visits.

Question: Can Tom conclude that he should also increase the prices for his bubble tea shop to increase the store visits?

1.6 Why Causal Inference Matters? Example 3

This is a fighter plane that just returned from the battlefield. Red dots are bullet holes.

Which part of A, B, and C would you reinforce to increase the pilot’s survival rate?

1.7 Why Causal Inference Matters? Example 4

I have held a secret from you for a long time, it’s time for me to confess ….

1.8 Nobel Prize in Economics (2021)

[…] the other half jointly to Joshua D. Angrist and Guido W. Imbens “for their methodological contributions to the analysis of causal relationships.”

1.9 Causal Inference

Causal inference is the process of estimating the unbiased causal effects of a particular policy/business intervention on the outcome variables of interest.
Correlation != Causation: Machine learning models are good at finding correlations, but not causations. For example, on rainy days, we observe more umbrellas on the street
- correct correlation statement: number of umbrellas is positively correlated with rainfall
- correct causal statement: heavier rain leads to more umbrellas
- incorrect causal statement: more umbrellas lead to heavier rain
Causality becomes more complex in the business world. Managers can easily make mistakes without causal inference training.
- Imagine if the actual causal effect of Instagram ads on incremental profit per customer is £1 for Tom, and Tom pays £1.5 for each click

2 Potential Outcome Framework

2.1 Rubin Causal Model and the Potential Outcome Framework

The Rubin causal model (RCM) or the Potential Outcome Framework is the well accepted framework for thinking about causal effects.
For each customer $i$, we can define the potential outcomes in order to evaluate the causal effect of a treatment on their outcomes:
- $Y^{1}_i$: the outcome if the customer is exposed to the treatment, ceteris paribus
- $Y^{0}_i$: the outcome if the customer is not exposed to the treatment, ceteris paribus

2.2 Definition: Individual Treatment Effects

The causal effect of the treatment on the individual $i$ is the difference between the two potential outcomes. We define the individual treatment effect $\delta_i$ as follows

\[ \delta_i = Y^1_i - Y^0_i \]

Important

These two potential outcomes do not depend on whether or not the individual is exposed to the treatment in reality.

2.3 Example

Tom would like to measure the causal effect of seeing a displayed ad on customer purchase probability.

2.4 Examples of Individual Treatment Effects

Let’s say we have retrieved all infinity stones from Thanos, and have created 2 parallel universes

In Universe 1, with a displayed ad
- Dr Strange has a rate of 70%
In Universe 2, without a displayed ad
- Dr Strange has a rate of 60%

Customer	Y1	Y0	TE
Dr Strange	0.7	0.6	0.1

2.5 A Motivating Example: A Group of Customers

Conceptually, we can collect a sample of customers, and estimate individual treatment effect for each of them

Customer	Y1	Y0	TE
Dr Strange	0.70	0.60	0.10
Iron Man	0.55	0.50	0.05
Thor	0.80	0.72	0.08
Hulk	0.60	0.62	-0.02

2.6 Fundamental Problem of Causal Inference

To measure the individual treatment effect of the ads on a customer’s purchase rate, we need to observe all potential outcomes of the same individual in parallel universes (i.e., with and without being exposed to the treatment).
We use $D_i = 1$ to denote the treated/treatment group and $D_i = 0$ to denote the untreated/control group in reality.
We only observe one potential outcome, the realized outcome, in our reality
- For treated customers, we only observe their realized outcomes of being treated: $Y^1_i|D_i = 1$
- For untreated customers, we only observe their realized outcomes of being untreated $Y^0_i|D_i = 0$
The remaining potential outcomes, i.e., counterfactual outcomes, are never observed in our reality
- For treated customers, we never observe their outcomes when being untreated: $Y^0_i|D_i = 1$
- For untreated customers, we never observe their outcomes when treated: $Y^1_i|D_i = 0$

2.7 Exercise

In the previous table, let’s say, Dr Strange and Hulk are treated (by the ads) in reality, while Iron Man and Thor are not treated.
Based on this information, we can find out the values of the realized outcomes and counterfactual outcomes for each customer.
For example, for Dr Strange:
- Realized outcome: $Y^1_i |D_i = 1 = 0.7$ (Note that we only observe this value in reality)
- Counterfactual outcome: $Y^0_i |D_i = 1 = 0.6$ (Note that we don’t observe this value in reality)

Customer	Y1	Y0	Treated
Dr Strange	0.7	?	Yes
Iron Man	?	0.5	No
Thor	?	0.72	No
Hulk	0.6	?	Yes

2.8 Fundamental Problem of Causal Inference

Since it is impossible to see both potential outcomes at once, one of the potential outcomes is always missing, so we can never quantify the individual treatment effects. This dilemma is called the Fundamental Problem of Causal Inference.

subject	Treated	Y1	Y0	Y1-Y0
Dr Strange	Yes	0.7	?	?
Iron Man	No	?	0.5	?
Thor	No	?	0.72	?
Hulk	Yes	0.6	?	?

3 Average Treatment Effects

3.1 The Average Treatment Effect

Since individual treatment effects are unobservable, we often care more about the average treatment effects (ATE) on the population level. The ATE is defined as the average of individual treatment effects across the population.

\[ ATE = E[Y^1_i - Y^0_i] = \frac{1}{N} \sum_{i=1}^{N} (Y^1_i - Y^0_i) \]

For display ads, can we obtain the ATE by directly calculating the difference in the average outcomes between the treatment group and control group. Why?

\[\begin{align*} & E[Y^1|D_i = 1] - E[Y^0|D_i = 0] \\ & = E[Y^1|D_i = 1] - E[Y^0|D_i = 1] \\ &+ E[Y^0|D_i = 1] - E[Y^0|D_i = 0] \end{align*}\]

3.2 Data Example

Please load data_treatmenteffect in the data folder into your RStudio.

Code

data_treatmenteffect <- read.csv("https://www.dropbox.com/scl/fi/ndvvwr298xkdtsj42y6az/individual-treatment-effects.csv?rlkey=z3mluwbqt9k1gtnb63xlc3l88&dl=1")

Exercise: This data are generated from Instagram’s paid ads, treated customers are those who see Instagram’s ads.
- Compute the difference in the average rates between the treated and untreated customers.
- Compute the ATE based on the individual treatment effects.
- Compare the two results.

3.3 The Average Treatment Effect

To quantify the correct ATE, we must randomize who receives the treatment, instead of targeting or letting the individuals choose the treatment.
After randomization, we can then obtain the ATE by comparing the difference in the average outcomes across the treatment group and control group. Because randomization ensures that
- Selection bias is fully removed¹
- The treatment effects on the treatment group individuals and the control group individuals should be equal. The former is called the average treatment effects on the treated (ATT), and the latter is called the average treatment effects on the untreated (ATU).

Exercise: Let’s go back to the previous data example and compute the ATT and ATU after randomization.

3.4 Gold Standard of Causal Inference

Mathematically, the previous slide can be represented by the Basic Identity of Causal Inference:

\[\begin{align*} & E[Y^1|D_i = 1] - E[Y^0|D_i = 0] \\ & = E[Y^1|D_i = 1] - E[Y^0|D_i = 1] \\ &+ E[Y^0|D_i = 1] - E[Y^0|D_i = 0] \end{align*}\]

Randomized experiments makes ATT equal to ATE, and removes selection bias. Thus, randomized experiments are the gold standard of causal inference.

Footnotes

Selection bias refers to the pre-existing difference between the treatment group and control group even without the treatment↩︎

--- date: "`r (lubridate::ymd('20241002')+lubridate::dweeks(5))`" title: "Class 11 Causal Inference & Potential Outcome Framework" execute: cache: false --- # Causal Inference ## Our Journey So Far - Any business activity brings benefits and costs. We're given the benefit information in all the case studies so far - **Apple (Week 1)**: influencer marketing increases sales by 2.5% - **1st assignment**: loyalty program increases retention rate to 95% - In reality, this benefit information is often not readily available, and we need to measure them using **causal inference tools**. ## Causal Inference Road Map ```{r} #| echo: false #| fig-align: 'center' #| out-width: "10cm" knitr::include_graphics("images/CourseStructure2024.png") ``` ## Learning Objectives - Understand key concepts of **causal inference** - Rubin's potential outcome framework - Fundamental problem of causal inference - Average treatment effect (ATE) and randomization - Learn the steps to conduct **randomized controlled trials** (RCTs) ## Why Causal Inference Matters? Example 1 Tom purchases paid ads on Instagram to advertise his new bubble tea shop. Instagram ads are targeted to individuals who are predicted to have a higher likelihood of being bubble tea lovers. In the end, some Instagram users saw no ads and some saw the ads. The **purchase rates** for each group are shown below. ```{r} #| echo: false #| fig-align: 'center' knitr::include_graphics("images/Week 6/bubble tea ads.png") ``` **Question**: Can Tom be confident to conclude the Instagram ads are effective in converting new customers? ## Why Causal Inference Matters? Example 2 Tom bought a marketing survey data from a consulting agency. The survey collected prices and store visits (sales) for different bubble tea shops in Canary Wharf. Tom finds that there seems to be a positive relationship between prices and store visits. ```{r} #| echo: false #| fig-align: 'center' knitr::include_graphics("images/Week 6/pricesales.png") ``` **Question**: Can Tom conclude that he should also increase the prices for his bubble tea shop to increase the store visits? ## Why Causal Inference Matters? Example 3 This is a fighter plane that just returned from the battlefield. Red dots are bullet holes. Which part of A, B, and C would you reinforce to increase the pilot's survival rate? ```{r} #| echo: false #| fig-align: 'center' knitr::include_graphics("images/Week 6/SurvivalBias.png") ``` ## Why Causal Inference Matters? Example 4 I have held a secret from you for a long time, it's time for me to confess .... ## Nobel Prize in Economics (2021) ```{r} #| echo: false #| fig-align: 'center' knitr::include_graphics("images/Week 6/NobelEcon.png") ``` > \[...\] the other half jointly to Joshua D. Angrist and Guido W. Imbens "for their methodological contributions to the analysis of causal relationships." ## Causal Inference - Causal inference is the process of estimating the unbiased causal effects of a particular policy/business intervention on the outcome variables of interest. - Correlation != Causation: **Machine learning** models are good at finding correlations, but not causations. For example, on rainy days, we observe more umbrellas on the street - correct correlation statement: number of umbrellas **is positively correlated** with rainfall - correct causal statement: heavier rain **leads to** more umbrellas - incorrect causal statement: more umbrellas **lead to** heavier rain - Causality becomes more complex in the business world. Managers can easily make mistakes without causal inference training. - Imagine if the actual causal effect of Instagram ads on incremental profit per customer is £1 for Tom, and Tom pays £1.5 for each click # Potential Outcome Framework ## Rubin Causal Model and the Potential Outcome Framework - The **Rubin causal model (RCM)** or the **Potential Outcome Framework** is the well accepted framework for thinking about causal effects. - For each customer $i$, we can define the **potential outcomes** in order to evaluate the causal effect of a treatment on their outcomes: - $Y^{1}_i$: the outcome if the customer is exposed to the treatment, ceteris paribus - $Y^{0}_i$: the outcome if the customer is not exposed to the treatment, ceteris paribus ## Definition: Individual Treatment Effects - The causal effect of the treatment on the individual $i$ is the difference between the two potential outcomes. We define the **individual treatment effect** $\delta_i$ as follows $$ \delta_i = Y^1_i - Y^0_i $$ ::: {.callout-important} These two potential outcomes do not depend on whether or not the individual is exposed to the treatment in reality. ::: ## Example **Tom would like to measure the causal effect of seeing a displayed ad on customer purchase probability.** ## Examples of Individual Treatment Effects Let's say we have retrieved all infinity stones from Thanos, and have created 2 parallel universes - In Universe 1, with a displayed ad - Dr Strange has a rate of 70% - In Universe 2, without a displayed ad - Dr Strange has a rate of 60% ```{r} #| echo: false pacman::p_load(data.table, knitr, kableExtra) kable( data.frame( Customer = c("Dr Strange"), Y1 = 0.7, Y0 = 0.6, "TE" = 0.1 ), escape = F ) |> kable_styling(font_size = 8, bootstrap_options = c("striped", "hover")) ``` ## A Motivating Example: A Group of Customers - Conceptually, we can collect a sample of customers, and estimate individual treatment effect for each of them ```{r} #| echo: false kable( data.frame( Customer = c("Dr Strange", "Iron Man", "Thor", "Hulk"), Y1 = c(0.7, 0.55, 0.8, 0.6), Y0 = c(0.6, 0.5, 0.72, 0.62), "TE" = c(0.7, 0.55, 0.8, 0.6) - c(0.6, 0.5, 0.72, 0.62) ), escape = F ) %>% kable_styling(font_size = 8, bootstrap_options = c("striped", "hover")) ``` ## Fundamental Problem of Causal Inference - To measure the individual treatment effect of the ads on a customer's purchase rate, we need to observe **all potential outcomes** of the **same individual** in **parallel universes** (i.e., with and without being exposed to the treatment). - We use $D_i = 1$ to denote the treated/treatment group and $D_i = 0$ to denote the untreated/control group in reality. - We only observe one potential outcome, the **realized outcome**, in our reality - For treated customers, we only observe their realized outcomes of being treated: $Y^1_i|D_i = 1$ - For untreated customers, we only observe their realized outcomes of being untreated $Y^0_i|D_i = 0$ - The remaining potential outcomes, i.e., **counterfactual** **outcomes**, are never observed in our reality - For treated customers, we never observe their outcomes when being untreated: $Y^0_i|D_i = 1$ - For untreated customers, we never observe their outcomes when treated: $Y^1_i|D_i = 0$ ## Exercise - In the previous table, let's say, Dr Strange and Hulk are treated (by the ads) in reality, while Iron Man and Thor are not treated. - Based on this information, we can find out the values of the realized outcomes and counterfactual outcomes for each customer. - For example, for Dr Strange: - Realized outcome: $Y^1_i |D_i = 1 = 0.7$ (Note that we only observe this value in reality) - Counterfactual outcome: $Y^0_i |D_i = 1 = 0.6$ (Note that we don't observe this value in reality) ```{r} #| echo: false kable( data.frame( Customer = c("Dr Strange", "Iron Man", "Thor", "Hulk"), Y1 = c(0.7, "?", "?", 0.6), Y0 = c("?", 0.5, 0.72, "?"), "Treated" = c("Yes", "No", "No", "Yes") ), escape = F ) %>% kable_styling(font_size = 8, bootstrap_options = c("striped", "hover")) ``` ## Fundamental Problem of Causal Inference - Since it is impossible to see both potential outcomes at once, one of the potential outcomes is always missing, so we can never quantify the individual treatment effects. This dilemma is called the **Fundamental Problem of Causal Inference**. ```{r} #| echo: false kable( data.frame( subject = c("Dr Strange", "Iron Man", "Thor", "Hulk"), Treated = c("Yes", "No", "No", "Yes"), Y1 = c(0.7, "?", "?", 0.6), Y0 = c("?", 0.5, 0.72, "?"), `Y1-Y0` = c("?", "?", "?", "?"), check.names = F ), escape = F ) %>% kable_styling(font_size = 8, bootstrap_options = c("striped", "hover")) ``` # Average Treatment Effects ## The Average Treatment Effect - Since individual treatment effects are unobservable, we often care more about the **average treatment effects** (ATE) on the population level. The ATE is defined as the average of individual treatment effects across the population. $$ ATE = E[Y^1_i - Y^0_i] = \frac{1}{N} \sum_{i=1}^{N} (Y^1_i - Y^0_i) $$ - For display ads, can we obtain the ATE by directly calculating the difference in the average outcomes between the treatment group and control group. Why? \begin{align*} & E[Y^1|D_i = 1] - E[Y^0|D_i = 0] \\ & = E[Y^1|D_i = 1] - E[Y^0|D_i = 1] \\ &+ E[Y^0|D_i = 1] - E[Y^0|D_i = 0] \end{align*} ## Data Example - Please load `data_treatmenteffect` in the data folder into your RStudio. ```{r} #| echo: true data_treatmenteffect <- read.csv("https://www.dropbox.com/scl/fi/ndvvwr298xkdtsj42y6az/individual-treatment-effects.csv?rlkey=z3mluwbqt9k1gtnb63xlc3l88&dl=1") ``` - **Exercise:** This data are generated from Instagram's paid ads, treated customers are those who see Instagram's ads. - Compute the difference in the average rates between the treated and untreated customers. - Compute the ATE based on the individual treatment effects. - Compare the two results. ## The Average Treatment Effect - To quantify the correct ATE, we must **randomize who receives the treatment,** instead of targeting or letting the individuals choose the treatment. - After randomization, we can then obtain the ATE by comparing the difference in the average outcomes across the treatment group and control group. Because randomization ensures that - Selection bias is fully removed^[Selection bias refers to the pre-existing difference between the treatment group and control group even without the treatment] - The treatment effects on the treatment group individuals and the control group individuals should be equal. The former is called the **average treatment effects on the treated** (ATT), and the latter is called the **average treatment effects on the untreated** (ATU). **Exercise:** Let's go back to the previous data example and compute the ATT and ATU after randomization. ## Gold Standard of Causal Inference - Mathematically, the previous slide can be represented by the **Basic Identity of Causal Inference**: \begin{align*} & E[Y^1|D_i = 1] - E[Y^0|D_i = 0] \\ & = E[Y^1|D_i = 1] - E[Y^0|D_i = 1] \\ &+ E[Y^0|D_i = 1] - E[Y^0|D_i = 0] \end{align*} ```{r} #| echo: false #| fig-align: 'center' knitr::include_graphics("images/Week 6/BasicIdentityofCausalInference.png") ``` - Randomized experiments makes ATT equal to ATE, and removes selection bias. Thus, randomized experiments are the **gold standard of causal inference**.