Class 8: Customer Segmentation Using Unsupervised Learning for M&S
1 Customer Segmentation
1.1 Customer Segmentation
Segmentation is a key step in the marketing strategy (STP) process, where customers are divided into meaningful groups based on characteristics relevant to designing and executing your marketing strategy.
It assumes that different customer groups provide varying levels of value to the company and/or require distinct marketing programmes to succeed (e.g., based on differing goals and needs).
1.2 Conventional Segmentation
Customer value segmentation is for targeting decisions based on customers’ potential long-term financial and strategic value to your company.
Demographic segmentation uses variables such as age, gender, income, family life cycle, educational qualification, socio-economic status, religion, company size and income, etc. These serve as proxies for goals, preferences or psychographics, as well as to characterize segments for marketing mix decisions.
Psychographic segmentation is for positioning and marketing mix design based on the psychology of the customer and consumer, including attitudes, identity, lifestyle, personality, etc.
Conventional segmentation methods often require subjective judgements. A more objective approach is to ‘let the data speak’ by utilising data analytics tools.
2 K-Means in R
2.1 Syntax of kmeans()
x: data with selected variables to apply K-meanscenters: an integerk= number of clustersiter.max(integer, default = 10): maximum number of iterations for a single run. Increase if you see non-convergence or very slow improvement.nstart(integer, default = 1): number of random initializations whencentersis an integer. The best solution (smallest total within-cluster sum of squares) is returned. Use 10–50+ for stability in practice.algorithm(character): one of “Hartigan-Wong” [default], “Lloyd”, “Forgy”, or “MacQueen”. Hartigan-Wong is typically fast/accurate; Lloyd/MacQueen can be preferable if you encounter empty clusters.trace(logical or integer, default = FALSE): prints progress of the algorithm ifTRUEor a positive integer, which can help debugging but produces verbose output.
3 Data Cleaning
3.1 Data Loading
Let’s first try customer segmentation based on total spending and Income.
Exercise: load data_full, create
total_spending, and selecttotal_spendingandIncomeas the clustering variables into a new data framedata_kmeans.
3.2 Data Pre-processing
To perform a cluster analysis in R, generally, the data should be prepared as follows:
Rows are observations (individuals) and columns are variables of interest for clustering.
Any missing value in the data must be removed or imputed.
The data must be standardised (i.e., scaled) to make variables comparable. Standardisation consists of transforming the variables such that they have mean zero and standard deviation one.1
3.3 Data Pre-processing: Missing Values
Check whether there are any missing values in the data.
Use mean imputation to fill in missing values.
3.4 Data Pre-processing: Standardisation
- We need to re-scale the clustering variables using
scale(), because the variables can be on very different scales.- Exercise: Scale the variables and create a new data frame
data_kmeans_scaled. - This is extremely important!
- Exercise: Scale the variables and create a new data frame
3.5 Visualisation of the Data
Let’s visualise the data to see whether there are any natural clusters.
Exercise: Create a scatter plot of
total_spendingandIncomeusingggplot2.Refer to the ggplot2 cheat sheet for more information on data visualisation in R.
4 Apply K-Means to M&S Case Study
4.1 Apply K-Means Clustering with 2 Clusters
set.seed()is to allow replication of results.kmeans()is the function to perform K-means clustering.centersis the number of clusters to form.nstartis the number of sets to be chosen.
4.2 More About Seed and Random Number in R
In R, random number generation is controlled via a “seed”. The random numbers generated are not truly random but pseudo-random, meaning they are generated by a deterministic algorithm that produces a sequence of numbers that appear random. Setting the seed ensures that you get the same sequence of pseudo-random numbers each time you run your code, making your results reproducible.
Use
set.seed()function to set the seed before generating random numbers. The argument toset.seed()is an integer value that initializes the random number generator.- For example,
set.seed(888)sets the seed to 888. You can choose any integer value as the seed.
- For example,
For any models that involve random processes (e.g., K-means clustering; random forest), setting the seed is important for reproducibility, especially when your analysis involves random sampling or random processes.
4.3 Examine the returned object, result_kmeans
size: The number of points in each cluster.cluster: A vector of integers (from 1:k) indicating the cluster to which each point is allocated.withinss: Vector of within-cluster sum of squares, one component per cluster.
4.4 Visualise the clusters
We can definitely use
ggplot2for visualisation, butclusterandfactoextraalready have built-in functions for visualising clusters.Use the function
fviz_cluster()to generate visualisations
5 Determine Optimal K
5.1 Determine the optimal number of clusters: Elbow Method
- The elbow method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.
Code
- There are alternative methods such as the silhouette method and the gap statistic method, but the elbow method is the most commonly used one.
5.2 Next Steps After Segmentation
Compare the CLV in different segments, and decide which segments to serve.
Develop marketing strategies for each segment. For example, for the high-value segment, you may want to increase the frequency of purchase by offering discounts or promotions.
Develop a customer journey map for each segment.
5.3 After-Class Readings
Useful source: K-means Cluster Analysis
K-means is the most commonly used clustering algorithm, but there are many other clustering algorithms available, such as hierarchical clustering, DBSCAN, Gaussian mixture models, etc. You can refer to this link to explore these algorithms for more advanced clustering tasks.
Footnotes
Another common method is to normalise the data, which consists of transforming the variables such that they have a minimum of zero and a maximum of one.↩︎