Stochastic Gradient Boosting Approach to Daily Attrition Scoring
Transcription
Stochastic Gradient Boosting Approach to Daily Attrition Scoring
Stochastic Gradient Boosting Approach to Daily Attrition Scoring Based on High-dimensional RFM Features Dr. Gerald Fahner Senior Director Analytic Science, FICO © 2015 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent. Agenda • Ultra-dynamic Attrition Scoring • Case Study—Credit Card Attrition • Category Attrition © 2015 Fair Isaac Corporation. Confidential. 2 Ultra-Dynamic (Daily) Attrition Scoring Approach Customer uses card Daily attrition risk score • Prolonged inactivity signals higher risk—drives up attrition risk score Re-engage customer when attrition risk exceeds some threshold … Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We … © 2015 Fair Isaac Corporation. Confidential. 3 Transaction Dynamics Hold Key Information • Given information at time of scoring, who is more likely to attrite? ─ • Which measures are most informative? How to combine Recency and Frequency into predicting attrition risk? Recency Spence Observation Period Attrite? Frequency Time of Scoring Attila Observation Period © 2015 Fair Isaac Corporation. Confidential. 4 Attrite? Recency: Days since last card use Frequency: Fraction of days card used during obs. period How Machine Learning Complements Domain Expertise Domain Expertise Machine Learning Good at intuiting key predictors 1 Doesn’t scale to many variables Lacks intuition 2 Poor at combining multiple predictors Poor at quantifying uncertainty Need story behind the numbers 4 3 Excels at combining many features into accurate probabilistic predictions Diagnose and visualize models to gain insight into effects # Recommended path © 2015 Fair Isaac Corporation. Confidential. 5 Key Elements of Approach Based on Recencies, Frequencies, Monetary values • High-dimensional feature space of complex events Featurization of transaction events • Machine learning / classification tools • • Stochastic Gradient Boosting Partial dependence visualization Performance evaluation • • Lift related to portfolio profit gain Out-of-sample / Out-of-time evaluation © 2015 Fair Isaac Corporation. Confidential. 6 Stochastic Gradient Boosting[1] Combines predictions from 100’s or 1000’s of shallow CARTs Training Data Prediction Function CART 2 Weighted average Score Outcomes CART 1 Scored ? New case CART M Predictors Inexplicable model by direct inspection © 2015 Fair Isaac Corporation. Confidential. 7 Predictors Agenda • Ultra-dynamic Attrition Scoring • Case Study—Credit Card Attrition ─ • Machine Learning for Higher Profit Category Attrition © 2015 Fair Isaac Corporation. Confidential. 8 Credit Card Case Study Data and Project Design ~5 million accounts. More than 1 billion transactions over 3 years • Transaction information: Date, Merchant Code, Amount, Authorized Flag • 2 years Model development 6 months Performanc e period Observation period Time of Scoring Out-of-Time validation Observation period Performance period Attrition Performance Definition Scoring Exclusions Binary indicator of card activity during Performance period Inactive © 2015 Fair Isaac Corporation. Confidential. 9 Statistical Measures of Model Performance Lift and Precision Target top α % High Scores with retention offer Would-be attriters Non-attriters λ= = Low Scores © 2015 Fair Isaac Corporation. Confidential. Lift at α % operating point: 10 Fraction of Attriters Among Targeted Base Attrition Rate Precision Base Attrition Rate # Attriters Among Targeted # Targeted = # Attriters Total # Total ( ) Profit from a Retention Campaign Actual Behavior of Targeted Customer Profit Contribution per Customer Would-be attriter we persuade to stay (CLV Gain – Contact Cost – Incentive Cost) Precision * Persuasion Rate Unpersuadable attriter (No CLV Gain – Contact Cost) Precision * (1–Persuasion Rate) Non-attriter, erroneously targeted (No CLV Gain – Contact Cost – Incentive Cost) © 2015 Fair Isaac Corporation. Confidential. 11 Fraction of Targeted Customers with this Behavior 1–Precision Profit Gain From Attrition Model Improvement[2] Gain = (λB − λ A ) Nαβ 0 (γ CLV + δ (1 − γ )) is Portfolio Profit Gain from improving model B over model A, where : λA Lift from model A λB Lift from model B α β0 Targeting Fraction 5% Base Attrition Rate 8% N CLV δ γ Portfolio Size 5 million Customer Lifetime Value $1,000 Incentive Cost $100 Persuasion Rate 20% © 2015 Fair Isaac Corporation. Confidential. 12 Will benchmark alternative models Portfolio-specific assumptions Benchmarking Predictive Models of Increasing Complexity • How much can we gain by making models more complex? • Are complex models robust over time? Model 3: Interaction model in R and F of complex events Model 2: Interaction model in R and F of card use Model 1: Additive model in R and F of card use Complex Event Examples • • • R: Recency F: Frequency Recent restaurant visit and frequent hotels More than $1,000 spent on travel last week Recent car deal and frequently at the pump Dimensionality of Feature Space © 2015 Fair Isaac Corporation. Confidential. 13 Interaction Detection Experiment Should Capture (Recency X Frequency) Interactions • Predictors: Recency and Frequency of card use ─ Model 1: Additive, nonlinear in R and F ─ Model 2: Captures interaction between R and F Out-of-sample / λ = 6.03 1 Out-of-time ⇒ λ2 = 6.54 validation • Gain = $2.86 MM s.t. portfolio assumptions Interaction effect in agreement with research by Fader and Hardie[3] © 2015 Fair Isaac Corporation. Confidential. 14 Interaction Visualization Tells Story Two-dimensional Partial Dependence Function[4] Probability to use card during next 6 months = 1–Pr(Attrition) Attila Spence Attila is at higher risk of attrition because his card use has lapsed for an unusually long time interval ? Spence: R=20, F=0.05 ? Frequency Recency Fraction of days card used Days since last card use © 2015 Fair Isaac Corporation. Confidential. 15 Attila: R=20, F=0.55 Featurization Experiment Should Capture Complex Events in Your Models • Define R and F features for complex events • Model 3: Candidate predictors include: Card use events + Hundreds of merchant category events + Monetary events defined by spending bands + No-authorization events Out-of-sample / Out-of-time validation λ3 = 7.52 Recall : λ1 = 6.03 λ2 = 6.54 ⇒ Gain over Model 1 (simple, additive) = $8.34 MM s.t. portfolio assumptions © 2015 Fair Isaac Corporation. Confidential. 16 Learning Curves Experiment Should Exploit Larger Samples to Develop More Complex Models Lift (O-o-S / O-o-T) Model 3 (high-dim complex events) 7 6 Model 2 (card R and F only) 5 1,000 © 2015 Fair Isaac Corporation. Confidential. 17 10,000 100,000 #Training Samples Agenda • Ultra-dynamic Attrition Scoring • Case Study—Credit Card Attrition • Category Attrition ─ Detecting Subtle Forms of Attrition © 2015 Fair Isaac Corporation. Confidential. 18 Merchant Category (MC) Attrition • Hundreds of credit card MC’s • Performance definition for a specific MC: ─ • Stop buying from this MC–while continuing card use for other MC’s May signal competitive influence or early belt-tightening—before total attrition occurs. Quick detection informs rapid intervention Card-level model Overall customer status Grocery model Grocery status Travel status Travel model Gas station status Gas station model © 2015 Fair Isaac Corporation. Confidential. 19 Possible interventions: Offer incentives at service stations, or start customer dialogue Summary • Daily attrition scoring quickly detects emergent attrition—signaled by unusually long time lapse since last transaction • With large transaction volumes, more complex models are more profitable • Machine learning helps with insight, automation, scale © 2015 Fair Isaac Corporation. Confidential. 20 References [1] Greedy Function Approximation: A Gradient Boosting Machine, by Jerome Friedman, The Annals of Statistics, 29(5), 2001, 1189-1232. [2] Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer Churn Models, by Scott Neslin et al., Journal of Marketing Research, 43(2), 2006, 204-211. [3] RFM and CLV: Using Iso-Value Curves for Customer Base Analysis, by Peter Fader, Bruce Hardie, and Ka Lok Lee, Journal of Marketing Research, 42(4), 2005, 415-430. [4] Predictive learning via rule ensembles, by Jerome Friedman et al., The Annals of Applied Statistics, 2(3), 2008, 916-954. © 2015 Fair Isaac Corporation. Confidential. 21 Thank You Dr. Gerald Fahner ++1 512 5323621 geraldfahner@fico.com © 2015 Fair Isaac Corporation. Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.