Class 8: Customer Segmentation Using Unsupervised Learning for M&S
1 (Case Study) Customer Segmentation for M&S
1.1 Customer Segmentation
Segmentation is a key step in the marketing strategy (STP) process, where customers are divided into meaningful groups based on characteristics relevant to designing and executing your marketing strategy.
It assumes that different customer groups provide varying levels of value to the company and/or require distinct marketing programs to succeed (e.g., based on differing goals and needs).
1.2 Conventional Segmentation
Customer value segmentation is for targeting decisions based on customers’ potential long-term financial and strategic value to your company.
Demographic segmentation uses variables such as age, gender, income, family life cycle, educational qualification, socio-economic status, religion, company size and income, etc. These serve as proxies for goals, preferences or psychographics, as well as to characterize segments for marketing mix decisions.
Psychographic segmentation is for positioning and marketing mix design based on the psychology of the customer and consumer, including attitudes, identity, lifestyle, personality, etc.
Conventional segmentation methods often require subjective judgments. A more objective approach is to ‘let the data speak’ by utilizing data analytics tools.
1.3 Syntax of kmeans()
x
: data with selected variables to apply K-meanscenters
: number of clustersiter.max
: the maximum number of iterations allowednstart
: how many random sets should be chosenalgorithm
: which algorithm to choose; default often workstrace
: do you want to trace intermediate steps?
2 Data Pre-processing
2.1 Data Loading
Let’s first try customer segmentation based on total spending and Income.
Exercise: load data_full, create
total_spending
, and selecttotal_spending
andIncome
as the clustering variables into a new data framedata_kmeans
.
2.2 Data Pre-processing
To perform a cluster analysis in R, generally, the data should be prepared as follows:
Rows are observations (individuals) and columns are variables of interest for clustering.
Any missing value in the data must be removed or imputed.
The data must be standardized (i.e., scaled) to make variables comparable. Standardization consists of transforming the variables such that they have mean zero and standard deviation one.1
2.3 Data Pre-processing: Missing Values
Check if there are any missing values in the data.
Use mean imputation to fill in missing values.
2.4 Data Pre-processing: Standardization
- Need to re-scale the clustering variables using
scale()
, because the variables can be of very different scales.- Exercise: Scale the variables and create a new data frame
data_kmeans_scaled
. - This is extremely important!
- Exercise: Scale the variables and create a new data frame
Code
# method 1
data_kmeans_scaled <- data_kmeans %>%
select(total_spending, Income) %>%
mutate(
total_spending = scale(total_spending),
Income = scale(Income)
)
# method 2: using across when there are many variables with the same transformation
data_kmeans_scaled <- data_kmeans %>%
select(total_spending, Income) %>%
mutate(across(everything(), scale))
2.5 Visualization of the Data
Let’s visualize the data to see if there are any natural clusters.
Exercise: Create a scatter plot of
total_spending
andIncome
usingggplot2
.Refer to the ggplot2 cheat sheet for more information on data visualization in R.
3 Apply K-Means
3.1 Apply K-Means Clustering with 2 Clusters
set.seed()
is to allow replication of results.kmeans()
is the function to perform K-means clustering.centers
is the number of clusters to form.nstart
is the number of sets to be chosen.
3.2 Examine the returned object, result_kmeans
size
: The number of points in each cluster.cluster
: A vector of integers (from 1:k) indicating the cluster to which each point is allocated.withinss
: Vector of within-cluster sum of squares, one component per cluster.
3.3 Visualize the clusters
We need 2 packages
cluster
andfactoextra
Use function
fviz_cluster()
to generate visualizations
4 Determine the K Value
4.1 Determine the optimal number of clusters: GAP Method
- The gap statistic compares the total within intra-cluster variation for different values of k with their expected values under null reference distribution of the data. The estimate of the optimal clusters will be the value that maximizes the gap statistic.
4.2 Determine the optimal number of clusters: Silhouette Method
- The silhouette value measures how similar an object is to its cluster (cohesion) compared to other clusters (separation). The silhouette ranges from -1 to 1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters.
4.3 Next Steps After Segmentation
Compare the CLV in different segments, and decide which segments to serve.
Develop marketing strategies for each segment. For example, for the high-value segment, you may want to increase the frequency of purchase by offering discounts or promotions.
Develop a customer journey map for each segment.
4.4 Term 3 Project Scopes
Smartphones contain sensors, from which we can apply machine learning models to understand the context of the user, whether it be relaxing on the sofa, jogging in a park, or working indoors in an office. The task is to consume this real life data and produce visualisations, and to produce an anomaly detection engine. The project may be extended to clustering users according to their behavioural patterns in an unsupervised fashion.
The project will explore fraud detection approaches using unsupervised ML including models such as isolation forests. The candidate will develop an understanding of the business problem and our data, formulating hypotheses and testing them. They will build, evaluate, and interpret their ML models.
4.5 After-Class Readings
- After-class Exercise: Try total spending, Frequency, and Recency as clustering covariates. Why these three variables? Then, find the optimal number of clusters. Visualize the clusters.
- Because they are the most important variables for customer segmentation, i.e., RFM (Recency, Frequency, Monetary) analysis.
- Useful source: K-means Cluster Analysis
Footnotes
Another common method is to normalize the data, which consists of transforming the variables such that they have a minimum of zero and a maximum of one.↩︎