Hidden Technical Debts in Churn Prediction Systems

Montreal AI Symposium, 2018

Co-Authors: Nikhil Saldanha, Raman Shrivastava

Abstract

Traditionally, businesses have defined churn simply as the loss of customers of a product or service that they provide. This definition is vague since it fails to specify who a customer is and the event that defines churn. The use of a lagging indicator of churn, such as cancellation of a subscription, renders the prediction inactionable since it is too late to prevent the churn. Further, if the business provides a non-contractual service, identification of an event that intersects well with business and users intent becomes tricky.

We have improved on this traditional definition by creating a framework for defining churn across various businesses. We start by defining the active users, these are the users who have performed a key revenue-generating event for the business in the past. The next step is to select the critical event for churn, we formulate churn as the absence of this event over a period of time. However, this definition also has its drawbacks since the constant time interval across users may not be very accurate. In order to solve this, we can formulate the problem as a ranking problem where users are ranked by the time taken to perform the critical event. Users who take longer to perform the critical event are likely to be “more churned” than others who take lesser time.

We propose to model churn prediction as a supervised machine learning problem where we predict the churn risk(probability that a user is likely to churn) given a time series data of the user’s behavior and the model parameters. Features relevant to user behaviour from the platform are fed into a RNN. In order to feed the data in a RNN, the features are aggregated over smaller time windows based on frequency of user activity. Absence of the critical event in a period following the prediction is labeled as churn and presence is labeled as not churn.

The churn prediction itself is not very useful for a business since ultimately, the company’s goal is to retain these users by taking a set of actions. To effectively take actions, the right context must come from the predictions themselves. In addition to this, the cost of retention must be considered due to a limited budget. All this means that churn predictions need to be trustworthy and correlate well with the features that are used to make the prediction in order for them to be actionable. In the past, LIME has been proposed as a way to attribute predictions to specific features in simple Feedforward and CNNs and build trust in the predictions. We propose variations to LIME and DeepLIFT to better suit multivariate time series data for RNNs.

Conventionally, models are evaluated using metrics like accuracy, ROC AUC, F1 Score for predictions on a hold out set. While these metrics give a good overview of model performance, in practice, they tend to be misleading due to business constraints. Since churn predictions are a means to take preventive actions, their success depends on the success of the actions. The model may predict very accurately for users who are the hardest to retain but is ultimately limited by the effectiveness of the actions. Very often the model performs well on low-value users but fails to perform well on high-value users, resulting in a net loss of revenue. Models with good performance on average may have hidden failure models that are especially insidious when used in production which may introduce long-term biases in the model.

Slicing metrics allows us to analyze the performance of a model on a more granular level. Usually, metrics are sliced by a particular feature value, which highlights the performance of the model on data having only that subset of feature values. We propose that along with slicing metrics by feature value, slicing metrics by customer segment becomes crucial to the evaluation of churn prediction models due to the variation in data distribution that can be observed across these segments. This manner of slicing allows us to develop specific features for specific customer segments and reduce negative bias towards outlying or minority segments which are usually the highest value users of any product.

Contributions

Converting churn from classification to ranking problems which outputs a distribution from which churn probability can be sampled based on different windows.
Using time-series user events to predict user behaviour by LSTM models.
Adding multi-variate time-series data support to LIME for explanation of feature correlations to churn.
Model performance evaluation on different segments of customer which reduces bias and provides business value by performing better on high-value customers.