Class 6 Customer Segmentation using Unsupervised Machine Learning

Author
Affiliation

Dr Wei Miao

UCL School of Management

Published

October 18, 2023

1 Overview of Predictive Analytics

1.1 Roadmap of Predictive Analytics

  • The core of any business decision is break-even analysis (BEQ; NPV; CLV. Weeks 1 and 2)

  • One effective way to increase firm profitability is to reduce marketing costs

  • In Weeks 3 and 4, we will learn how to utilize predictive analytics (i.e., machine learning models) to reduce marketing costs and improve marketing efficiency

1.2 Types of Predictive Analytics

In Term 2, you will learn predictive analytics models systematically. By then, think about how those techniques can be applied back to these case studies.

1.3 Types of Predictive Analytics

1.4 Learning Objectives

  • Understand the concept of unsupervised learning and how to apply clustering analyses for customer segmentation

2 Segmentation with Unsupervised Learning

2.1 Customer Segmentation

Segmentation is the first step in the strategy of marketing (STP), which is the process of dividing customers into meaningful groups based on any characteristics relevant to design and execution of your marketing strategy.

It assumes that different customer groups offer different levels of value to the company and/or require different marketing programs to succeed with (e.g., based on different goals and needs).

2.2 Conventional Segmentation

  • Customer value segmentation is for targeting decisions based on customers’ potential long-term financial and strategic value to your company.

  • Benefit segmentation is for positioning and marketing mix design on the basis of customer and consumer goals or usage, the needs, wants, problems and the trade-offs they are willing to make across benefits (e.g., price vs. quality).

  • Psychographic segmentation is for positioning and marketing mix design based on the psychology of the customer and consumer, including attitudes, identity, lifestyle, personality, etc.

  • Demographic segmentation uses variables such as age, gender, income, family life cycle, educational qualification, socio-economic status, religion, company size and income, etc. These serve as proxies for goals, preferences or psychographics, as well as to characterize segments for marketing mix decisions.

Conventional segmentation methods require lots of subjective judgments. A more objective way is to “let the data speak” by using data analytics tools.

2.3 K-Means Clustering

  • K-means clustering is one of the most commonly used unsupervised machine learning algorithms for partitioning a given data set into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified by the analyst.

  • It can classify customers into multiple segments (i.e., clusters), such that customers within the same cluster are as similar as possible, whereas customers from different clusters are as dissimilar as possible. 

  • Input: (1) customer characteristics; (2) the number of clusters
  • Output: cluster membership of each customer

2.4 K-Means Clustering: Step 1

  • Raw data points; each dot is a customer

  • X and Y axis are customer characteristics

  • Obviously there should be 2 segments

  • Let’s see how K-means uses a data-driven way to classify customers into 2 segments

2.5 K-Means Clustering: Step 2

  • We specify 2 segments

  • K-means initializes the process by randomly selecting 2 centroids1

2.6 K-Means Clustering: Step 3

  • K-means computes the distance of each customer to the red and blue centroids

  • K-means assigns each customer to red segment or blue segment based on which centroid is closer

2.7 K-Means Clustering: Step 4

  • K-means updates the new centroids of each segment

  • The red cross and blue cross in the picture are the new centroids

  • We still see some “outliers”, so need to continue the algorithm

2.8 K-Means Clustering: Step 5

  • K-means computes the distance of each customer to the red and blue centroids

  • K-means updates each customer to red segment or blue segment based on which centroid is closer

  • Now the outliers are correctly assigned each segment

2.9 K-Means Clustering: Step 6

  • K-means updates the new centroid from the previous segmentation

  • K-means computes the distance of each customer to the new centroids

  • K-means finds that all customers are correctly segmented to nearest centroids, so no need to continue

  • As the algorithm converges, the algorithm stops

3 Customer Segmentation for Tesco

3.1 Syntax of kmeans()

  1. Decide to do customer segmentation based on total spending and income

  • x: data with selected variables to apply K-means
  • centers: number of clusters
  • iter.max: the maximum number of iterations allowed
  • nstart: how many random sets should be chosen
  • algorithm: which algorithm to choose; default often works
  • trace: do you want to trace intermediate steps?

3.2 Data collection and cleaning

  • Need to re-scale the two variables using scale(), because the two variables are of very different scales

    • This is extremely important!
    • set.seed() is to allow replication of results.
    • Refer to this data camp tutorial for more details.
data_kmeans <- data_full%>%
  select(Income,total_spending)%>%
  mutate(Income = scale(Income),
         total_spending = scale(total_spending))

3.3 Conduct K-means clustering

set.seed(888)
result_kmeans <- kmeans(data_kmeans,
                        centers = 2,
                        nstart = 10)

3.4 Examine the returned object, result_kmeans

str(result_kmeans)
List of 9
 $ cluster     : int [1:2000] 1 2 1 2 2 1 2 2 2 2 ...
 $ centers     : num [1:2, 1:2] 0.95 -0.663 1.022 -0.713
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "1" "2"
  .. ..$ : chr [1:2] "Income" "total_spending"
 $ totss       : num 3998
 $ withinss    : num [1:2] 726 553
 $ tot.withinss: num 1280
 $ betweenss   : num 2718
 $ size        : int [1:2] 822 1178
 $ iter        : int 1
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"
  • cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated.

  • centers: A matrix of cluster centers.

  • totss: The total sum of squares.

  • withinss: Vector of within-cluster sum of squares, one component per cluster.

  • tot.withinss: Total within-cluster sum of squares, i.e. sum(withinss).

  • betweenss: The between-cluster sum of squares, i.e. $totss-tot.withinss$.

  • size: The number of points in each cluster.

3.5 Visualize the clusters

  • We need 2 packages cluster and factoextra
  • Use function fviz_cluster() to generate visualizations
pacman::p_load(cluster,factoextra)
set.seed(888)
fviz_cluster(result_kmeans,
             data = data_kmeans)

3.6 Determine the optimal number of clusters: GAP Method

set.seed(888)
gap_stat <- clusGap(data_kmeans, 
                    FUN = kmeans,
                    K.max = 10,
                    B = 50)
fviz_gap_stat(gap_stat)

3.7 Determine the optimal number of clusters: Silhouette Method

set.seed(888)
fviz_nbclust(data_kmeans, kmeans, method = "silhouette")

3.8 Business Implications

  • Compare the CLV in the two segments, and decide which segment to serve.
    • This is a general idea of segmentation and targeting using unsupervised learning
    • Finish this exercise after class

3.9 After-Class Readings

Footnotes

  1. Due to this randomness, each different starting points might give different results. We need to reinitialize the process repeatedly to ensure robustness of results.↩︎