The amount of data created worldwide has been increasing exponentially over the past decade with some estimates placing the total at over 60 zettabytes as of 2030 (Source: Statista). Data without analytics, however, is of little value to business decision-makers aiming to improve performance and increase growth. It is therefore no surprise that top-tier consulting companies, analytics firms and business schools have been promoting the positive returns to greater usage of analytics technology. It also explains why an increasing number of data analytics enthusiasts (like Tom and you!) are willing to pay up to £40k tuition fee (that is 10,000 bubble teas!) to join the world’s best MSc Business Analytics program at the UCL School of Management (hmm, it’s now already week 3, too late to ask David for a refund!).
By identifying patterns and trends in massive amounts of data, business analytics enables organizations to make better decisions and improve performance. Descriptive analytics is the simplest and most widely used type of analytics; it is used to generate key performance indicators (KPIs) and metrics for business reports and dashboards. Research shows that, even with the adoption of very simple descriptive analytics, businesses can improve their performance by a large extent — Berman and Israeli (2022) analyze the staggered adoption of a retail analytics dashboard by more than 1,500 e-commerce websites,1 and find an increase of 4%–10% in average weekly revenues post-adoption. The increase in revenue is not explained by price changes or advertising optimization. Instead, it is consistent with the improvement on customer relationship management, personalization, and prospecting technologies to retailer websites. The adoption and usage of descriptive analytics also increases the diversity of products sold, the number of transactions, the numbers of website visitors and unique customers, and the revenue from repeat customers. These findings are consistent with a complementary effect of descriptive analytics that serve as a monitoring device that helps business analysts control additional martech tools and amplify their value. Without using the descriptive dashboard, companies are unable to reap the benefits associated with these technologies.
In the remaining of this case, we will explore (1) how to conduct data wrangling and data cleaning using R and (2) how to conduct descriptive analytics for M&S to achieve marketing efficiency. You might ask, where is Tom? Well, M&S rewarded Tom with a few days off for his successful calculation of CLV last week, and he is now considering setting up his bubble tea business in Canary Wharf. You are the only one who can help M&S now until Tom’s return!
2 Database Marketing at M&S
We learned in Week 2, how to compute the customer lifetime value for M&S. However, when computing the CLV, we used an “average” approach, which did not consider customer heterogeneity. That is, when considering each component in the CLV formula, such as customer spending in each period, their retention rate, etc., we took the average across all customers, and assumed customers are homogeneous. As a result, every customer would have the same CLV. Nevertheless, this is a strong assumption in practice — every customer is unique and should be treated differently.
The key to successful customer relationship management is to maintain a customer database that tracks detailed customer information, including their demograhpic information and past purchase history. This information would empower marketing analytics team to compute individual CLV for each customer, and conduct individualized targeted marketing.
2.1 Demographic Information
Knowing your consumer is a vital concept of running any business. Is the business selling fertilizer to farmers, apparel to teenage girls, or vacations to senior citizens? The distinctions are readily apparent in this comparison.
Demographics define the qualities of clients. To be successful, business owners must understand the demographics of their clients and the trends or changes that are occurring within those specific traits.
The following demographic information is usually of interest to business managers:
Age: Consumer behavior is strongly influenced by age. Younger consumers are more affluent and willing to spend more on entertainment, fashion, and movies. Seniors spend less on these items; they are less active, spend more time indoors, and require more medical treatment. Additionally, market segments can be defined by age groups. For instance, digital devices such as iPhones are targeted more towards millennials than at seniors. While older adults are increasingly utilizing technology, they remain less digitally savvy than millennials and purchase fewer digital products.
Gender: Gender also matters. Males and females have vastly diverse demands and tastes, which influence their purchasing decisions. As a result, some products are created with a specific gender in mind. Macy’s, Nordstrom, and The Gap all have departments dedicated to teenage girls’ clothes, while Seiko has a specific line of diver watches for men only.
Income: Income has a substantial influence on consumer behavior and product purchases. Middle-income customers make purchases with due regard for the utility of money. They do not have unlimited money to spend, and hence the money spent on one item may be used on something else. On the other hand, consumers with higher incomes tend to be less price sensitive and have a higher willingness to pay.
Education: Consumers’ level of education has an effect on their impressions of the world around them and on the amount of research they conduct prior to making a purchase. Individuals with a higher level of education will spend more time educating themselves before investing their money. Education has an impact on fashion, film, and television programming. Consumers with a higher level of education can be more distrustful of commercials and the facts offered.
M&S has collected rich customer demographic information through its loyalty program, M&S Sparks membership. The data scientist team has the following demographic variables:
ID: Customer’s unique identifier
Year_Birth: Customer’s birth year
Education: Customer’s education level
Marital_Status: Customer’s marital status
Income: Customer’s yearly household income
Kidhome: Number of children in customer’s household
Teenhome: Number of teenagers in customer’s household
Dt_Customer: Date of customer’s enrollment with the company
2.2 Purchase History
“History doesn’t repeat itself, but it often rhymes.”
This popular aphorism, frequently (and perhaps incorrectly) attributed to Mark Twain, is frequently invoked to demonstrate that, while past events do not always provide a clear indication of future events, they do provide valuable context. This sentiment is especially true for marketing managers, where a consumer’s purchase history provides invaluable insight into their future purchasing habits.
M&S’s data engineering team has assembled a cross-sectional customer purchase history data today, with variables including2:
MntWines: Amount spent on wine in last 2 years
MntFruits: Amount spent on fruits in last 2 years
MntMeatProducts: Amount spent on meat in last 2 years
MntFishProducts: Amount spent on fish in last 2 years
MntSweetProducts: Amount spent on sweets in last 2 years
MntGoldProds: Amount spent on gold and jewelry in last 2 years
NumDealsPurchases: Number of purchases made with a discount
NumWebPurchases: Number of purchases made through Ocado’s web site
NumCatalogPurchases: Number of purchases made using a catalogue
NumStorePurchases: Number of purchases made directly in stores
NumWebVisitsMonth: Number of visits to company’s web site in the last month
Complain: 1 if customer complained in the last 2 years, 0 otherwise
Response: 1 if customer accepted the offer in the last campaign, 0 otherwise
Recency: Number of days since customer’s last purchase
3 Data Wrangling
3.1 Data Loading
To work on the datasets, we first need to load the raw data into R. In R, we can use read.csv(filepath) to load the data into R environment.
For your convenience, I have stored the data file on my Dropbox and the download link will be provided. We can directly feed the url links to read.csv() function in R to download and create the dataset in R’s environment.
The data is also provided to you in .csv format, which you can also load from your local disk.
Load the dataset into your R environment. Name the dataset as data_full.
Code
## use read.csv() to download and load the data, assign it to an R data object## header = T argument is to tell read.csv() to keep the dataset header (first row)pacman::p_load(dplyr, modelsummary)data_full <-read.csv("https://www.dropbox.com/scl/fi/2q7ppqtyca0pd3j486osl/data_full.csv?rlkey=gsyk51q27vd1skek4qpn5ikgm&dl=1")
After running the above code blocks, you should see the dataset named data_full in your RStudio environment.
Now, click into the dataset, take a look, and get a sense of how it looks like.
Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset. We often need to deal with missing values and outliers. In this case, we will focus on missing values.
Check if any variables in data_full have missing values. If so, how many missing values are there in each variable? Use the variable average to replace missing values in the dataset.
Code
datasummary_skim(data_full, fmt =3)
Warning: These variables were omitted because they include more than 50 levels:
Dt_Customer.
Unique
Missing Pct.
Mean
SD
Min
Median
Max
Histogram
ID
2000
0
5599.172
3242.019
0.000
5492.000
11191.000
MntWines
738
0
306.058
338.276
0.000
176.500
1493.000
MntFruits
157
0
26.358
39.885
0.000
8.000
199.000
MntMeatProducts
532
0
167.885
225.292
0.000
68.000
1725.000
MntFishProducts
179
0
37.614
54.556
0.000
12.000
259.000
MntSweetProducts
175
0
27.495
41.756
0.000
8.000
263.000
MntGoldProds
207
0
43.786
51.736
0.000
24.000
362.000
NumDealsPurchases
15
0
2.321
1.958
0.000
2.000
15.000
NumWebPurchases
15
0
4.082
2.773
0.000
4.000
27.000
NumCatalogPurchases
14
0
2.691
2.956
0.000
2.000
28.000
NumStorePurchases
14
0
5.819
3.256
0.000
5.000
13.000
NumWebVisitsMonth
15
0
5.308
2.452
0.000
6.000
20.000
Complain
2
0
0.010
0.100
0.000
0.000
1.000
Response
2
0
0.150
0.358
0.000
0.000
1.000
Year_Birth
59
0
1968.843
12.020
1893.000
1970.000
1996.000
Income
1783
1
52139.697
21492.372
1730.000
51518.000
162397.000
Kidhome
3
0
0.444
0.540
0.000
0.000
2.000
Teenhome
3
0
0.501
0.546
0.000
0.000
2.000
Recency
100
0
49.165
28.951
0.000
50.000
99.000
N
%
Education
2n Cycle
185
9.250
Basic
43
2.150
Graduation
992
49.600
Master
327
16.350
PhD
453
22.650
Marital_Status
Alone
3
0.150
Divorced
206
10.300
Married
767
38.350
Single
436
21.800
Together
521
26.050
Widow
67
3.350
We find that income has missing values for a few customers. We can replace the missing values with the average income of all customers.
Tip: use mutate() and ifelse() functions in dplyr to replace missing values
Next, once the final dataset is ready, we can proceed to use descriptive analytics to conduct preliminary customer analysis using dplyr.
Descriptive analytics is concerned with summarizing and highlighting patterns in current and historical data in order to assist businesses in comprehending what has occurred thus far. However, it makes no attempt to explain why something occurred or to forecast what may occur in the future. “why something occurred” belongs to the scope of causal inference; “forecast what may occur in the future” falls in the scope of predictive analytics. To answer those questions, analysts must combine descriptive analytics with other types of analytics, which we will learn in later weeks.
Your task is to ‘get to know’ the data_full by conducting some descriptive analytics.
Mutate a new variable called total_spending, which means the sum of spending on all products by customers. Mute a new variable called Frequency, which means the sum of the number of purchases made through different channels. Provide a correlation table between total_spending and other continuous variables, including Frequency, Complain, Response, Income, Kidhome, Teenhome, Recency.
RFM analysis is a customer segmentation technique that uses past purchase behavior to divide customers into groups. RFM stands for Recency, Frequency, and Monetary Value. RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions, as well as those who are more likely to churn. We will learn more about RFM analysis in the subsequent weeks when we talk about segmentation and targeting.
Compare the average total spending across demographic groups. Which group has the highest average spending?
marital status
education
Code
data_full %>%group_by(Marital_Status) %>%# group bysummarise(avg_spending =mean(total_spending)) %>%# compute the mean of spending for each groupungroup()
Which education group accounts for the largest percentage of customers?
tip: in dplyr, there is a function called n(), which counts the number of rows in the group after group_by()
Code
data_full %>%# Think about why data_demo? what if we use data_full?group_by(Education) %>%# group by Educationsummarise(n_customers_in_group =n()) %>%# compute the number of customers in each education groupungroup() %>%# important to do the ungroup after each group_bymutate(percent_customer = n_customers_in_group /sum(n_customers_in_group)) %>%# compute the percentage of customers in each grouparrange(-percent_customer) # descending order
What is the average Total £ spent on wine and total spending by customers with and without kids?
tip: first mutate a variable called has_kid, which equals 1 if the number of kids in the household is larger than 0, and otherwise 0; then group by this has_kid variable
Group customers by Marital Status and compare the average spending in each group. What insights can you draw? - Tips: use group_by() + summarise()
Code
data_full %>%mutate(total_spending = MntWines + MntFruits + MntMeatProducts + MntFishProducts + MntSweetProducts + MntGoldProds) %>%# mutate total spendinggroup_by(Marital_Status) %>%summarise(avg_spending =mean(total_spending, na.rm = T)) %>%# compute the mean of spending for each groupungroup() %>%arrange(-avg_spending)
References
Berman, Ron, and Ayelet Israeli. 2022. “The Value of Descriptive Analytics: Evidence from Online Retailers.”Marketing Science, March, mksc.2022.1352. https://doi.org/10.1287/mksc.2022.1352.
Footnotes
We will cover the difference-in-differences technique to establish causal inference later in the module.↩︎
Cross-sectional data is data collected at a single point in time. It is a snapshot of customers at a particular moment. In contrast, longitudinal data is collected over time, allowing analysts to track changes in customer behavior over time.↩︎
---title: "Descriptive Analytics for M&S"author: "Wei Miao"thanks: "This case was prepared by Wei Miao, UCL School of Management, University College London for MSIN0094 Marketing Analytics module. This case was developed to provide material for class discussion rather than to illustrate either effective or ineffective handling of a business situation. Names and data may have been disguised or fabricated. Please do not circulate without permission. All copyrights reserved."date: "`r (lubridate::ymd('20241002')+lubridate::dweeks(2))`"subtitle: "MSIN0094 Case Study"institute: UCL School of Managementdate-format: longformat: html: default pdf: number-sections: true toc: false number-depth: 2 fontsize: 9pt linestretch: 1.25 knitr: opts_chunk: echo: false warning: false message: false include: falsebibliography: references.bibexecute: freeze: auto cache: trueeditor_options: chunk_output_type: inline---# The Power of Descriptive AnalyticsThe amount of data created worldwide has been increasing exponentially over the past decade with some estimates placing the total at over 60 zettabytes as of 2030 (Source: Statista). Data without analytics, however, is of little value to business decision-makers aiming to improve performance and increase growth. It is therefore no surprise that top-tier consulting companies, analytics firms and business schools have been promoting the positive returns to greater usage of analytics technology. It also explains why an increasing number of data analytics enthusiasts (like Tom and you!) are willing to pay up to £40k tuition fee (that is 10,000 bubble teas!) to join the world's best MSc Business Analytics program at the UCL School of Management (hmm, it's now already week 3, too late to ask David for a refund!).By identifying patterns and trends in massive amounts of data, business analytics enables organizations to make better decisions and improve performance. Descriptive analytics is the simplest and most widely used type of analytics; it is used to generate key performance indicators (KPIs) and metrics for business reports and dashboards. Research shows that, even with the adoption of very simple descriptive analytics, businesses can improve their performance by a large extent --- @bermanValueDescriptiveAnalytics2022 analyze the staggered adoption of a retail analytics dashboard by more than 1,500 e-commerce websites,[^1] and find an increase of 4%--10% in average weekly revenues post-adoption. The increase in revenue is not explained by price changes or advertising optimization. Instead, it is consistent with the improvement on customer relationship management, personalization, and prospecting technologies to retailer websites. The adoption and usage of descriptive analytics also increases the diversity of products sold, the number of transactions, the numbers of website visitors and unique customers, and the revenue from repeat customers. These findings are consistent with a complementary effect of descriptive analytics that serve as a monitoring device that helps business analysts control additional martech tools and amplify their value. Without using the descriptive dashboard, companies are unable to reap the benefits associated with these technologies.[^1]: We will cover the difference-in-differences technique to establish causal inference later in the module.In the remaining of this case, we will explore (1) how to conduct data wrangling and data cleaning using R and (2) how to conduct descriptive analytics for M&S to achieve marketing efficiency. You might ask, where is Tom? Well, M&S rewarded Tom with a few days off for his successful calculation of CLV last week, and he is now considering setting up his bubble tea business in Canary Wharf. You are the only one who can help M&S now until Tom's return!# Database Marketing at M&SWe learned in Week 2, how to compute the customer lifetime value for M&S. However, when computing the CLV, we used an "average" approach, which did not consider customer heterogeneity. That is, when considering each component in the CLV formula, such as customer spending in each period, their retention rate, etc., we took the average across all customers, and assumed customers are homogeneous. As a result, every customer would have the same CLV. Nevertheless, this is a strong assumption in practice --- every customer is unique and should be treated differently.The key to successful customer relationship management is to maintain a customer database that tracks detailed customer information, including their demograhpic information and past purchase history. This information would empower marketing analytics team to compute individual CLV for each customer, and conduct individualized targeted marketing.## Demographic InformationKnowing your consumer is a vital concept of running any business. Is the business selling fertilizer to farmers, apparel to teenage girls, or vacations to senior citizens? The distinctions are readily apparent in this comparison.Demographics define the qualities of clients. To be successful, business owners must understand the demographics of their clients and the trends or changes that are occurring within those specific traits.The following demographic information is usually of interest to business managers:*Age*: Consumer behavior is strongly influenced by age. Younger consumers are more affluent and willing to spend more on entertainment, fashion, and movies. Seniors spend less on these items; they are less active, spend more time indoors, and require more medical treatment. Additionally, market segments can be defined by age groups. For instance, digital devices such as iPhones are targeted more towards millennials than at seniors. While older adults are increasingly utilizing technology, they remain less digitally savvy than millennials and purchase fewer digital products.*Gender*: Gender also matters. Males and females have vastly diverse demands and tastes, which influence their purchasing decisions. As a result, some products are created with a specific gender in mind. Macy's, Nordstrom, and The Gap all have departments dedicated to teenage girls' clothes, while Seiko has a specific line of diver watches for men only.*Income*: Income has a substantial influence on consumer behavior and product purchases. Middle-income customers make purchases with due regard for the utility of money. They do not have unlimited money to spend, and hence the money spent on one item may be used on something else. On the other hand, consumers with higher incomes tend to be less price sensitive and have a higher willingness to pay.*Education*: Consumers' level of education has an effect on their impressions of the world around them and on the amount of research they conduct prior to making a purchase. Individuals with a higher level of education will spend more time educating themselves before investing their money. Education has an impact on fashion, film, and television programming. Consumers with a higher level of education can be more distrustful of commercials and the facts offered.M&S has collected rich customer demographic information through its loyalty program, M&S Sparks membership. The data scientist team has the following demographic variables:- ID: Customer's unique identifier- Year_Birth: Customer's birth year- Education: Customer's education level- Marital_Status: Customer's marital status- Income: Customer's yearly household income- Kidhome: Number of children in customer's household- Teenhome: Number of teenagers in customer's household- Dt_Customer: Date of customer's enrollment with the company## Purchase History"History doesn't repeat itself, but it often rhymes."This popular aphorism, frequently (and perhaps incorrectly) attributed to Mark Twain, is frequently invoked to demonstrate that, while past events do not always provide a clear indication of future events, they do provide valuable context. This sentiment is especially true for marketing managers, where a consumer's purchase history provides invaluable insight into their future purchasing habits.M&S's data engineering team has assembled a **cross-sectional** customer purchase history data today, with variables including^[Cross-sectional data is data collected at a single point in time. It is a snapshot of customers at a particular moment. In contrast, **longitudinal** data is collected over time, allowing analysts to track changes in customer behavior over time.]:- MntWines: Amount spent on wine in last 2 years- MntFruits: Amount spent on fruits in last 2 years- MntMeatProducts: Amount spent on meat in last 2 years- MntFishProducts: Amount spent on fish in last 2 years- MntSweetProducts: Amount spent on sweets in last 2 years- MntGoldProds: Amount spent on gold and jewelry in last 2 years- NumDealsPurchases: Number of purchases made with a discount- NumWebPurchases: Number of purchases made through Ocado's web site- NumCatalogPurchases: Number of purchases made using a catalogue- NumStorePurchases: Number of purchases made directly in stores- NumWebVisitsMonth: Number of visits to company's web site in the last month- Complain: 1 if customer complained in the last 2 years, 0 otherwise- Response: 1 if customer accepted the offer in the last campaign, 0 otherwise- Recency: Number of days since customer's last purchase# Data Wrangling## Data LoadingTo work on the datasets, we first need to load the raw data into R. In R, we can use `read.csv(filepath)` to load the data into R environment.For your convenience, I have stored the data file on my Dropbox and the download link will be provided. We can directly feed the url links to `read.csv()` function in R to download and create the dataset in R's environment.The data is also provided to you in .csv format, which you can also load from your local disk.::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**Load the dataset into your R environment. Name the dataset as `data_full`.::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**Load the dataset into your R environment. Name the dataset as `data_full`.### Answer```{r}## use read.csv() to download and load the data, assign it to an R data object## header = T argument is to tell read.csv() to keep the dataset header (first row)pacman::p_load(dplyr, modelsummary)data_full <-read.csv("https://www.dropbox.com/scl/fi/2q7ppqtyca0pd3j486osl/data_full.csv?rlkey=gsyk51q27vd1skek4qpn5ikgm&dl=1")```::::::After running the above code blocks, you should see the dataset named `data_full` in your RStudio environment.Now, click into the dataset, take a look, and get a sense of how it looks like.## Data Types::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**Check all data types in `data_full` are correct and as expected::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**Check all data types in `data_full` are correct and as expected### Answer```{r}# can use str() to get the structure of data# Lina covered this in the induction weekstr(data_full)```::::::## Data CleaningData cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset. We often need to deal with missing values and outliers. In this case, we will focus on missing values.::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**Check if any variables in `data_full` have missing values. If so, how many missing values are there in each variable? Use the variable average to replace missing values in the dataset.::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**Check if any variables in `data_full` have missing values. If so, how many missing values are there in each variable? Use the variable average to replace missing values in the dataset.### Answer```{r}datasummary_skim(data_full, fmt =3)```- We find that income has missing values for a few customers. We can replace the missing values with the average income of all customers. - Tip: use `mutate()` and `ifelse()` functions in `dplyr` to replace missing values```{r}data_full <- data_full %>%mutate(Income =ifelse(is.na(Income), mean(Income, na.rm = T), Income))```::::::# Preliminary Customer AnalysisNext, once the final dataset is ready, we can proceed to use descriptive analytics to conduct preliminary customer analysis using `dplyr`.Descriptive analytics is concerned with summarizing and highlighting patterns in current and historical data in order to assist businesses in comprehending what has occurred thus far. However, it makes no attempt to explain why something occurred or to forecast what may occur in the future. "why something occurred" belongs to the scope of causal inference; "forecast what may occur in the future" falls in the scope of predictive analytics. To answer those questions, analysts must combine descriptive analytics with other types of analytics, which we will learn in later weeks.Your task is to 'get to know' the `data_full` by conducting some descriptive analytics.::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**- Report the summary statistics of the cleaned data, discuss your business insights (open question; from central tendency and dispersion perspectives)::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**- Report the summary statistics of the cleaned data, discuss your business insights (open question; from central tendency and dispersion perspectives)### Answer```{r}# remember to load package modelsummary before calling this functiondatasummary_skim(data_full)```::::::::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**- Mutate a new variable called `total_spending`, which means the sum of spending on all products by customers. - total_spending = MntWines + MntFruits + MntMeatProducts + MntFishProducts + MntSweetProducts + MntGoldProds- Mute a new variable called `Frequency`, which means the sum of the number of purchases made through different channels. - Frequency = NumStorePurchases + NumWebPurchases + NumCatalogPurchases- Provide a correlation table between `total_spending` and other continuous variables, including Frequency, Complain, Response, Income, Kidhome, Teenhome, Recency. ::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**- Mutate a new variable called `total_spending`, which means the sum of spending on all products by customers. Mute a new variable called `Frequency`, which means the sum of the number of purchases made through different channels. Provide a correlation table between `total_spending` and other continuous variables, including Frequency, Complain, Response, Income, Kidhome, Teenhome, Recency.### Answer```{r}data_full <- data_full %>%mutate(total_spending = MntWines + MntFruits + MntMeatProducts + MntFishProducts + MntSweetProducts + MntGoldProds) %>%mutate(Frequency = NumStorePurchases + NumWebPurchases + NumCatalogPurchases)``````{r}datasummary_correlation(data_full %>%select(total_spending, Complain, Response, Income, Kidhome, Teenhome, Recency))```::::::::: {.callout-note}### RFM AnalysisRFM analysis is a customer segmentation technique that uses past purchase behavior to divide customers into groups. RFM stands for Recency, Frequency, and Monetary Value. RFM helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions, as well as those who are more likely to churn. We will learn more about RFM analysis in the subsequent weeks when we talk about segmentation and targeting.:::::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**- Compare the average total spending across demographic groups. Which group has the highest average spending? - marital status - education::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**- Compare the average total spending across demographic groups. Which group has the highest average spending? - marital status - education### Answer```{r}data_full %>%group_by(Marital_Status) %>%# group bysummarise(avg_spending =mean(total_spending)) %>%# compute the mean of spending for each groupungroup()```::::::::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**- Which education group accounts for the largest percentage of customers? - tip: in dplyr, there is a function called `n()`, which counts the number of rows in the group after `group_by()`::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**- Which education group accounts for the largest percentage of customers? - tip: in dplyr, there is a function called `n()`, which counts the number of rows in the group after `group_by()`### Answer```{r}data_full %>%# Think about why data_demo? what if we use data_full?group_by(Education) %>%# group by Educationsummarise(n_customers_in_group =n()) %>%# compute the number of customers in each education groupungroup() %>%# important to do the ungroup after each group_bymutate(percent_customer = n_customers_in_group /sum(n_customers_in_group)) %>%# compute the percentage of customers in each grouparrange(-percent_customer) # descending order```::::::::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**- What is the average Total £ spent on wine and total spending by customers with and without kids? - tip: first mutate a variable called `has_kid`, which equals 1 if the number of kids in the household is larger than 0, and otherwise 0; then group by this `has_kid` variable::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**- What is the average Total £ spent on wine and total spending by customers with and without kids? - tip: first mutate a variable called `has_kid`, which equals 1 if the number of kids in the household is larger than 0, and otherwise 0; then group by this `has_kid` variable### Answer```{r}data_full %>%mutate(has_kid =ifelse(Kidhome >0, 1, 0)) %>%group_by(has_kid) %>%summarise(avg_spending_wine =mean(MntWines, na.rm = T),avg_total_spending =mean(total_spending, na.rm = T) ) %>%ungroup()```::::::# After-class ExerciseAfter the class, you can further explore the dataset and conduct more descriptive analytics.::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**For complainers and non-complainers who have Master's degrees, find the total number and also the percent of customers who responded to the offer.::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**For complainers and non-complainers who have Master's degrees, find the total number and also the percent of customers who responded to the offer.### Answer```{r}data_full %>%filter(Education =="Master") %>%group_by(Complain) %>%summarise(n_responder =sum(Response, na.rm = T),n_customer =n() ) %>%ungroup() %>%mutate(percent_of_responder = n_responder / n_customer)```::::::::: {.content-visible when-format='pdf'}::: {.callout}**Question `r question_index<-question_index+1; question_index-1`**Group customers by Marital Status and compare the average spending in each group. What insights can you draw? - Tips: use `group_by()` + `summarise()`::::::::: {.content-visible when-format='html'}::: {.panel-tabset}### **Question `r question_index-1`**Group customers by Marital Status and compare the average spending in each group. What insights can you draw? - Tips: use `group_by()` + `summarise()`### Answer```{r}data_full %>%mutate(total_spending = MntWines + MntFruits + MntMeatProducts + MntFishProducts + MntSweetProducts + MntGoldProds) %>%# mutate total spendinggroup_by(Marital_Status) %>%summarise(avg_spending =mean(total_spending, na.rm = T)) %>%# compute the mean of spending for each groupungroup() %>%arrange(-avg_spending)```::::::# References {.unnumbered}