Customer Churn Prediction

RaMesh RaWal (Neomenon)
5 min readNov 14, 2019

--

Customer Churn

Customer churn refers to when a customer (player, subscriber, user, etc.) ceases his or her relationship with a company. Online businesses typically treat a customer as churned once a particular amount of time has elapsed since the customer’s last interaction with the site or service. The full cost of customer churn includes both lost revenue and the marketing costs involved with replacing those customers with new ones. Reducing customer churn is a key business goal of every online business.

The Importance of Predicting Customer Churn

The ability to predict that a particular customer is at a high risk of churning, while there is still time to do something about it, represents a huge additional potential revenue source for every online business. Besides the direct loss of revenue that results from a customer abandoning the business, the costs of initially acquiring that customer may not have already been covered by the customer’s spending to date. (In other words, acquiring that customer may have actually been a losing investment.) Furthermore, it is always more difficult and expensive to acquire a new customer than it is to retain current paying customers.

Steps of Churn Prediction

  • Data processing
  • Feature selection & engineering
  • Modeling
  • Insights and Actions

Data Processing

The data were taken from 2 years of transaction data in every store. The data were the transaction details of customers in-store at a specific time. The customers’ data were the transaction data with bill dates and other transaction details. The train set and validation set data were taken from the overall customer data of date 2019–02–01 and the test set was taken from the overall customer data of date 2019–03–01.

Figure a
Figure b

The above ‘Figure-a’ is the amount of churn and non-churn customers in February 2019. ‘Figure-b’ is the amount of churned customers in February (‘Feb’) who were also new customers joined from February 2019. ‘Past’ is the number of churned customers who were not new customers joined from February.

In the figure above is the analysis and visualization of dataset and features in 3D space.

Feature Selection & Engineering

Features were more complicated to extract from this dataset. The features that were extracted were the past transaction behavior of a specific customer.

Features 1:

  • 7 months attendance of customer
  • Origin (Value of registered time of customer)
  • Visit ( Average visit per month)

Features 2:

  • Average Basket
  • Visit
  • Weekday
  • Weekend
  • Five Hundred
  • Thousand
  • Fifteen hundred
  • Three Thousand
  • Five Thousand
  • Ten Thousand
  • Above

Correlation matrix:

Feature Importance:

Feature importance using random forest

Modeling

Train Set:

Active customers in February month were taken as train set where the new customers who also were churned customers were excluded from the training data. Those types of customers were excluded because they have very fewer data and such data can affect our predictive model. Among the remaining customers, 2500 were churned customers and 3000 non-churned customers were taken as a training set for the predictive model.

Test Set:

Active customers in March month were taken as test sets where the new customers who also were churned customers were excluded from the test data. Among remaining customers, 43,000 customers were taken as test sets where 3300 were churned, customers and 39,000 non-churned customers.

Model Building

Result

The test accuracy of the predictive model was around 90%. Where 648 churned customers were wrongly predicted as non-churned customers and 3877 non-churned customers were wrongly predicted as churned customers. Where 39225 non-churned customers were correctly predicted as non-churned and 2672 churned customers were correctly predicted as churned customers.

Conclusion

Churn Prediction was done using a neural network (Sequential model). The dataset creation and feature engineering were the complicated part of this project. High accuracy cannot be achieved simply in this type of dataset. Training the previous month dataset now we can predict the next month’s customers’ status in churn prediction.

Future Work

  • Analysis of new customers who are going to churn from next month.
  • Finding other factors (features) that can help our model to give better accuracy.

References

https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e

https://www.optimove.com/resources/learning-center/customer-churn-prediction-and-prevention

https://plot.ly/python/3d-scatter-plots/

--

--