Class 11 Case Study: Improve User Engagement on Social Media Platforms

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

November 8, 2023

1 Procedures of A/B Testings

1.1 Motivating Example with Tom’s Loyalty Program

  • Tom is considering whether or not to introduce a loyalty program for his bubble tea business. This decision is essentially a cost-benefit analysis

    • Cost: it takes money and time to develop the loyalty program

    • Benefit: it may increase spending and retention rate, and hence future CLV

  • Cost can be estimated through budgeting, but how to estimate the benefit from introducing LP?

1.2 Step 1: Decide on the Unit of Randomization

We decide the granularity of randomization unit based on the research question at hand.

  • individual (often chosen)

  • household

  • store

  • others even more granular (e.g., device level) or even less granular (e.g., city level)

1.3 Step 1: Decide on the Unit of Randomization

Proposal 1: Randomize the loyalty program to West London and East London.

  • Do you expect the “randomize” to be true randomization?1

No, because East London and West London are intrinsically different. Randomization can only work well when the number of randomization units are large enough.

Proposal 2: Randomize the loyalty program among individual customers.

  • Is this true randomization?

Yes, as long as we have a large number of individual customers, after randomization, we should see the treatment group and control group customers to have balanced characteristics.

  • What problems can we still have?
  1. Spillover: Customers may talk to each other, so individual customers in the control group (who are not supposed to see the loyalty program) may also become aware of the loyalty program.
  2. Crossover: the same individual may accidentally receive different treatments.

1.4 Step 1: Pros and Cons of Granularity

Disadvantages of granularity:

  • Costs and logistics

  • Spillovers and crossovers

Advantages of granularity:

  • Increase the chance of successful randomization, thereby mitigating any systematic unbalance of covariates before the experiment.

Exercise:

  • How can we randomize individualized price discounts to customers?

We can send individually personalized coupons to customers, e.g., Co-op weekly offers, Uber’s dynamic pricing.

1.5 Step 2: Mitigate Spillover and Crossover Effects

  • Crossover Effects: A crossover occurs when an individual who was supposed to be assigned to one treatment is accidentally exposed to another or more treatments.
    • e.g., For online A/B testing, a notorious crossover effect is that when browsers reset the cookies, the same individual customer may be treated as a different new customer.
  • Spillover effects: The behavior of the treatment group can affect control group as well
    • e.g., customers may share the promotions with family members and friends.

Question: How should Tom mitigate spillover and crossover effects?

  • Make sure that the same unit receives the same treatment throughout the experiment, e.g., forcing the customers to log into the website using their user accounts. User IDs should be unique.

  • Randomize at the level of plausibly isolated social networks such as a community, rather than individual level.

  • However, we must acknowledge that, it is really challenging to implement an A/B testing without any crossover or spillover in the field.

1.6 Step 3: Decide on Randomization Allocation Scheme

  • Individuals (or the relevant unit of randomization) are allocated at random into a treatment condition based on some decision rules.

  • Due to the high costs and potential risks of A/B testing, we often select a small percentage of customers into the treatment condition, while the remaining customer should do “business-as-usual”.

1.7 Step 4: Collect Data

  • Any field experiment should be aware of the need for a sufficiently large sample size, or sufficient statistical power.
    • The larger sample size, the higher statistical power for the experiment; meanwhile, larger sample size brings higher costs and risks.
    • Run a power calculation in R
  • Collect both data on the outcome variables of interest and consumer characteristics data

Proposal: We need to collect customers’ retention rate data and link the retention data with their treatment assignment.

1.8 Step 5: Interpreting Results from a Field Experiment

Step 5.1: Randomization check

  • We need to check if the treatment group and control group are well-balanced in terms of their pre-treatment characteristics, especially the outcome variables.

Step 5.2: Analyze the data and estimate the ATE

  • t-test to examine the difference in the average outcome between the treatment group and control group. In R, we can use t.test()
  • Regression analysis when analyzing A/B/N testing or multivariate experiments.

2 Case Background

2.1 Situation Analysis

  • Business model of Twitter, and other social media platforms in general?

Platform business model. Network effect is the key to success.

  • How does Twitter make money?
  • Ads

  • Freemium strategy: though general everyday use is free, for premium features, customers have to pay

  • Who are Twitter’s customers?
  • Users

  • Advertisers

  • Who are the collaborators of Twitter?
  • Business Partners: Companies that integrate Twitter content into their services, such as news organizations or broadcasters.
  • Advertisers and Marketers: Agencies that develop campaigns for the platform.
  • Content Creators: Influencers and celebrities who attract and engage large audiences.
  • Who are Twitter’s direct and indirect competitors?
  • Direct: Other social media platforms like Facebook, Instagram, etc.
  • Indirect: News websites, discussion forums like Reddit, and alternative communication platforms that offer different ways for people to obtain information and interact online.
  • Which PESTLE factors should we focus on?
  • Legal and Regulatory Issues: Changes in regulations related to data privacy, online speech, and censorship can significantly impact operations.

2.2 Business Objective

[…] we are going to draw upon social comparison theory and gamification to help Twitter further improve its user engagement in its newly introduced feature called “Communities” on the platform. “Communities” is a twitter feature that aims to enrich user engagement by catering to specific interests and subjects. These Communities offer users a dedicated space to convene around shared topics of interest, spanning domains such as celebrity fandoms, movie enthusiasts, and various hobbies.

2.3 Theoretical Motivation

When proposing business ideas, we should base our proposals on scientific, well-established theories from different disciplines such as Psychology and Behavioral Economics:

2.4 Business Proposal

  • We propose to implement a leaderboard to rank different communities based on points based on Gamification Theory and Social Comparison Theory.

3 A/B Testing for Twitter

3.1 Step 1: Decide on the Unit of Randomization

  • What would be the best unit of randomization?

The best level would be user level. Device level would be too granular and can easily cause crossover effects.

3.2 Step 2: Mitigate Spillover and Crossover Effects

  • What are the potential problems for spillover and crossover?
  • A user may use multiple devices, causing crossover effects

  • A user may talk to family members/friends, causing spillover effects.

3.3 Step 3: Decide on Randomization Allocation Scheme

  • How should we determine the randomization scheme?
  • Since A/B testing can be costly and risky, normally we would not use all the users.

    • On the first day of A/B testing, we can randomize a small percent of arriving customers (e.g., 10%) into the treatment condition

    • The remaining arriving customers will be in the control group

  • After randomization is assigned, the treatment should remain the same for each user.

3.4 Step 4: Collect Data

  • What is the sample size we need?

We can do a power analysis using pwr package in R, or simply some websites, e.g., this link.

  • What data should we collect?
  • Demographic data, including registration date, age, gender, etc.

  • Behavioral data, including logs of tweeting, retweeting, likes, and comments

The above data serve 2 purposes: (1) randomization check (2) estimation of treatment effects

3.5 Step 5: Data analytics

  • Randomization checks

We should do this on Day 1 right after randomization is done, just to ensure the randomization worked well.

  • How to estimate the treatment effects?

(1) ATE is the difference in the group means across the treatment group and control groups. We can conduct a paired t-test to statistically test the effectiveness of B.

(2) To ensure the difference is statistically significant, we need to do a hypothesis test using t-test.

(3) Next week, we will see that, we can also run a linear regression to obtain the average treatment effects.

Footnotes

  1. All answers to questions in the slides are on the webpage version of lecture notes.↩︎