Improve User Engagement for Instagram Using A/B/N Testing

MSIN0094 Case Study

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

November 12, 2025

1 Case background

Instagram, one of the world’s leading social media platforms, has achieved significant success by providing users with a visually driven, interactive space for self-expression, connection, and content sharing. With over a billion active users, it has become a major platform for individuals, influencers, and businesses to connect with broader audiences.

[5C’s for Instagram] At its core, Instagram operates on a platform business model, where the network effect is key to its success. Its revenue is primarily generated through advertising (sponsored posts, stories, and videos) and e-commerce features that allow direct-to-consumer sales. The platform serves three main customer groups: Users who share content and connect with others, Advertisers who promote their products, and Content Creators who produce engaging material. Instagram’s collaborators include business partners who integrate its content and the influencers who attract large audiences. The competitive landscape includes direct rivals like TikTok and Facebook, and indirect competitors such as news websites and discussion forums like Reddit. Furthermore, Instagram must navigate a complex regulatory environment, particularly concerning data privacy, online speech, and censorship.

Despite its popularity, Instagram faces the challenge of maintaining high levels of user engagement and growth, especially as competition from other platforms intensifies.

In this consulting project, you are hired as a data science consultant to help Instagram improve user engagement. Your task is to propose innovative business ideas based on established consumer behaviour theories and then design an A/B/N testing plan to rigorously evaluate their effectiveness. The goal is to provide data-driven recommendations that can demonstrably increase user activity on the platform.

Screenshot of A Nobody’s Instagram

Screenshot of A Nobody’s Instagram

2 Gamification strategies for Instagram

Gamification offers a promising approach to deepen engagement and retain users. By creating a more interactive and rewarding experience, it aligns with Instagram’s focus on community building and personal expression. In recent years, gamification has emerged as a powerful tool in marketing, aimed at enhancing user engagement and loyalty. It involves integrating game-like elements—such as points, badges, and rewards—into non-game contexts to make experiences more enjoyable (Seaborn and Fels 2015). By tapping into fundamental human motivations like achievement and competition, gamification encourages users to interact more frequently and meaningfully with a platform. From loyalty programmes to interactive challenges, these strategies are designed to increase user activity and foster a deeper emotional connection to the brand.

Social media platforms, in particular, are leveraging gamification to boost user engagement and retention. With features like badges, leaderboards, and interactive challenges, these platforms aim to make user interactions feel rewarding and enjoyable. In this case study, we will explore how gamification can be effectively applied to Instagram, using psychological and behavioural economic theories to design features that drive user engagement.

2.1 Social comparison theory

Social comparison theory, first articulated by Leon Festinger in 1954, proposes that people possess a fundamental drive to evaluate their opinions, abilities, and performance. When objective standards (e.g. a test score benchmark) are unavailable or ambiguous, individuals turn to others as social yardsticks. The closer and more similar the comparison target, the more diagnostic the information tends to be for self-evaluation.

There are three broad directions of comparison:

  • Upward comparison (looking to someone slightly “better”) can inspire improvement, goal setting, and learning. However, if the gap feels unattainable it may trigger discouragement or reduced self-esteem.
  • Downward comparison (to someone “worse”) can protect self-image and generate reassurance, but may also reduce motivation if it creates complacency.
  • Lateral (or horizontal) comparison (to someone similar) often yields the most accurate calibration of one’s own performance and norms of appropriate behaviour.

Motivations for comparison span accuracy (“Am I performing well?”), self-improvement (“How can I get better?”), and self-enhancement (“Can I feel good about my standing?”). Emotional outcomes therefore range from admiration and inspiration to envy or relief, depending on direction and perceived attainability. Platform design can influence which of these dominate by curating what metrics are visible and which peers are surfaced (e.g. suggesting similarly active creators rather than only mega-influencers).

On social media, algorithmic feeds, follower counts, engagement tallies (likes, saves, shares), and badges create a dense comparison environment. This can accelerate learning and content quality improvement (users emulate high-performing formats or posting cadence), but also risks negative affect (envy, anxiety, performance pressure). Thoughtful implementation therefore balances motivational gains with safeguards—limiting extreme upward gaps, providing constructive feedback (e.g. trend vs peers of similar size), and highlighting progress (streaks or percentile movement) rather than raw absolute rank alone.

Key design implications for Instagram:

  1. Relevance: Show users comparison cohorts matched on niche, follower band, or recent activity to maintain attainability.
  2. Directional framing: Pair upward exemplars with actionable tips (“Accounts gaining similar engagement increased Stories by 20% this week”).
  3. Progress orientation: Display movement (“You moved up 3 places among fitness creators”) rather than a static rank to sustain momentum.
  4. Well-being guardrails: Provide optional visibility controls or private analytics dashboards to reduce public pressure, and integrate reminders about healthy usage if large downward swings occur.
  5. Multidimensional metrics: Rotate different engagement dimensions (saves, meaningful comments, shares) to avoid over-fixation on a single vanity metric.

Applying these principles, a leaderboard or comparative insights panel can harness beneficial upward and lateral comparisons while mitigating unhealthy extremes.

By creating a leaderboard, users could see their standings based on metrics like follower engagement, post interactions, or content quality. Ranking users against one another could motivate them to post more frequently, improve content quality, or engage more with other users’ posts. The leaderboard would provide real-time feedback, fostering both upward and downward comparisons. Users higher up the leaderboard would feel motivated to maintain their status, while those lower might be motivated to increase their activity to climb the ranks.

Social Comparison Theory

Social Comparison Theory

Fitness apps often incorporate social comparison to keep users motivated. For example, a fitness app might display a leaderboard that ranks users by steps taken, calories burned, or workout streaks. Users can compare their progress with friends or a larger community. This often motivates users to stay active and improve their standing. By allowing users to see both how they compare to others and celebrate their achievements, such apps encourage regular engagement and goal completion. Companies like Strava, which adds social aspects to running and cycling, have successfully used this approach to drive sustained engagement.

Think about how Instagram could leverage Social Comparison Theory to increase user engagement. Provide a specific example of a feature or mechanism that Instagram could implement to encourage social comparison among users.

Instagram could introduce a gamified leaderboard system that ranks users based on their daily activity levels, such as the number of posts, likes, comments, or stories shared. Users could see their rankings relative to other users, encouraging competition and social comparison.

For example, Instagram could display a daily leaderboard that shows the top users based on their activity metrics, motivating others to increase their engagement to climb the rankings. This feature would leverage social comparison theory by creating a sense of competition and achievement among users, driving them to engage more actively with the platform.

3 Testing proposals using A/B/N testing

We have proposed various strategies to boost user activity on Instagram. Now, we need to design an A/B/N testing plan to evaluate the effectiveness of these strategies.

3.1 Step 1: Decide on the Unit of Randomisation

  • What would be the best unit of randomisation?
  • What are the potential problems for spillover and crossover?
  • The ideal randomisation level would be the user level.

  • Device level would be too granular and can easily cause crossover effects.

  • Need to force users to log in using the same account to make sure there is no crossover effect. This explains why websites and apps always ask users to log in before using the service.

  • A user may use multiple devices, causing crossover effects; that is, the same user may be exposed to different treatments on their phones, laptops, and tablets.

    • This can be mitigated by forcing users to log in using the same account on all devices.
  • Spillover effects may occur when a user talks to family members or friends about the treatment they received, potentially influencing their behaviour as well as the user’s. Meanwhile, even if the user does not directly talk to others about the treatment, they may still influence others’ behaviour through their actions on the platform due to the network effect.

    • This can be mitigated by ensuring that users are not aware of the treatment they received and by keeping the treatment confidential.

3.2 Step 2: Decide on Randomisation Allocation Scheme

  • How should we determine the randomisation scheme?
  • Since A/B/N testing can be costly and risky, normally we would not use all the users.

    • Method 1: On testing launch date, we can randomly assign users to different treatment groups based on their user ID. For instance, we can assign 10% of users to the treatment group and 90% to the control group. Then, we can take the last digit of the user ID and assign the user to the treatment group if the last digit is 0, and to the control group if the last digit is 1-9.

    • Method 2: We can also randomly assign users to different treatment groups based on random sampling. See the code below.

  • Once randomisation is assigned, the treatment should remain the same for each user during the experiment period.

Code
# Method 1: Randomisation using user ID

data_user <- data_user %>%
    mutate(treated = ifelse(ID %% 10 == 0, 1, 0 ))
Code
# Method 2: Randomisation using R

# how to randomise the treatment if there is one control group and one treatment group

set.seed(888)

# assign 10% of users to the treatment group
treatment_probability <- 0.1

treated_index <- sample(1:nrow(data_user),
    nrow(data_user) * treatment_probability,
    replace = F
)

data_user <- data_user %>%
    mutate(treated = ifelse(ID %in% treated_index,
        1,
        0
    ))

3.3 Step 3: Decide on Sample Selection and Treatment Duration

  • What is the sample size we need?
  • We can conduct a power analysis using the pwr package in R, or use a website such as this link.

  • Suppose we want to detect a small effect size (Cohen’s d = 0.2) with a significance level of 0.05 and power of 0.8 for a two-tailed t-test. The required sample size per group can be calculated as follows:

Code
pacman::p_load(pwr)

# Define parameters
effect_size <- 0.2 # Small effect size
alpha <- 0.05 # Significance level
power <- 0.8 # Desired power

# Calculate required sample size per group
sample_size <- pwr.t.test(
    d = effect_size,
    sig.level = alpha,
    power = power,
    type = "two.sample",
    alternative = "two.sided"
)

3.4 Step 4: Collect Data

  • What data should we collect?
  • We need to collect the following two types of data. The data serve two purposes: (1) randomisation check and (2) estimation of treatment effects.

    • Demographic data — to conduct the randomisation check.

    • Behavioural data — to estimate the treatment effects.

3.5 Step 5: Interpreting Results from a Field Experiment

  • Once data are collected, how can we test our hypothesis?
  • First, we need to conduct a randomisation check to ensure that the treatment and control groups have similar characteristics. For any significant differences, we need to run a regression model to control for these differences.
Code
pacman::p_load(dplyr)
data_instagram <- read.csv("https://www.dropbox.com/scl/fi/wf7al7k8go8tg9rkf033r/instagram_ab_test.csv?rlkey=5u43dxj705iepx1bir5h7snkg&dl=1")


# examine whether there are any differences between the treatment and control groups
t.test(age ~ treatment,
    data = data_instagram %>%
    filter(treatment %in% c('control','A'))
)

    Welch Two Sample t-test

data:  age by treatment
t = -0.90268, df = 636.35, p-value = 0.367
alternative hypothesis: true difference in means between group A and group control is not equal to 0
95 percent confidence interval:
 -1.1134248  0.4121476
sample estimates:
      mean in group A mean in group control 
             24.91722              25.26786 
Code
t.test(age ~ treatment,
    data = data_instagram %>%
    filter(treatment %in% c('control','B'))
)

    Welch Two Sample t-test

data:  age by treatment
t = -0.61826, df = 668.76, p-value = 0.5366
alternative hypothesis: true difference in means between group B and group control is not equal to 0
95 percent confidence interval:
 -0.9547753  0.4974924
sample estimates:
      mean in group B mean in group control 
             25.03922              25.26786 
  • Next, we can analyse the treatment effects by comparing the key activity metrics between the treatment and control groups. We can use pairwise t-tests if it is A/B testing, or linear regression models if it is A/B/N testing.
Code
data_instagram_avg <- data_instagram %>%
    group_by(treatment) %>%
    summarise(avg_post_total_activity = mean(post_total_activity)) %>%
    ungroup()

# compare the treatment effects for proposal A versus control
data_instagram_avg$avg_post_total_activity[1] - data_instagram_avg$avg_post_total_activity[3]
[1] 13.3913
Code
# compare the treatment effects for proposal B versus control
data_instagram_avg$avg_post_total_activity[2] - data_instagram_avg$avg_post_total_activity[3]
[1] 33.70592
Code
# is the difference statistically significant?
t.test(post_total_activity ~ treatment,
    data = data_instagram %>%
    filter(treatment %in% c('control','A'))
)

    Welch Two Sample t-test

data:  post_total_activity by treatment
t = 24.481, df = 586.29, p-value < 2.2e-16
alternative hypothesis: true difference in means between group A and group control is not equal to 0
95 percent confidence interval:
 12.31696 14.46564
sample estimates:
      mean in group A mean in group control 
             56.18212              42.79082 
Code
t.test(post_total_activity ~ treatment,
    data = data_instagram %>%
    filter(treatment %in% c('control','B'))
)

    Welch Two Sample t-test

data:  post_total_activity by treatment
t = 57.328, df = 550.65, p-value < 2.2e-16
alternative hypothesis: true difference in means between group B and group control is not equal to 0
95 percent confidence interval:
 32.55102 34.86081
sample estimates:
      mean in group B mean in group control 
             76.49673              42.79082 
  • We can also run a linear regression to estimate the average treatment effects for A/B/N testing.
Code
pacman::p_load(modelsummary, fixest)
# run a linear regression model to estimate the treatment effects

# create dummy variables for treatment groups

data_instagram <- data_instagram %>%
mutate(treatment_factor = as.factor(treatment)) %>%
mutate(treatment_factor = relevel(treatment_factor, ref = "control"))

feols(fml = post_total_activity ~ treatment_factor,
data = data_instagram) %>%
modelsummary(stars = TRUE)
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 42.791***
(0.379)
treatment_factorA 13.391***
(0.575)
treatment_factorB 33.706***
(0.573)
Num.Obs. 1000
R2 0.777
R2 Adj. 0.776
AIC 6872.4
BIC 6887.1
RMSE 7.50
Std.Errors IID

Based on the analyses, it seems that both proposal A and proposal B have a significant positive impact on user engagement. However, we need to consider the costs and feasibility of implementing these features on Instagram. By conducting A/B/N testing, we can evaluate the effectiveness of different gamification strategies and make data-driven decisions to optimise user engagement on the platform.

References

Seaborn, Katie, and Deborah I Fels. 2015. “Gamification in Theory and Action: A Survey.” International Journal of Human-Computer Studies 74: 14–31.