If you hold a credit card right now, it is based on the work our team did, with over 1200 iterations in 6 months day and night, over and over!
Terabytes of data on behavioural and finance indicators of spend. How does one estimate where to spend money on, how does one predict before hand what their spend is going to be and how can organizations leverage this data. While working with American express on millions of customer we worked day and night taking the project to senior leadership in New York. The task, super simple, to predict how much will customers and small businesses spend on their credit cards next month. Time spent was over 2 years working on it when I came in to provide fresh perspective. The craetive ways I thought of dealing with the issue was recursively try to partition spenders on their spending behaviour . Being proprietory the project, it cannot be displayed up here, but the essence of the process remains this; divide customers on behavioural spend parameters to smaller and smaller subsets until you come to a very similar dataset that has been previously seen (similarity found by clustering), once the dataset is found, the work is to then develop a prediction based on a model built on the training dataset which was similar to the cluster. The problem hence was solved and if you hold a credit card right now, it is based on the work the team did. The time it took? A daily grind of more than 10 hours, for 6 months. The effort it took, more than 1200 iterations on trying to solve the problem using different strategies or by tweaking the gradient boosting algorithms developed by Tianqi Chen. We tried to convert off the double differentials to single ones when pruning the tree. The solution was creatively found over a weekend of roundtable discussion with team! It generates over million in pure profits every year to the firm, no credit card is sold without having our footprints on it.We started using gradient boosted algorithms to predict the spend likek XGBOOST, CATBOOST, Lightgbm, etc. We also did explored the solution using native KNNs, SVMs and deep local neural networks. We finally found it in recursive divide and conquer and then predict basde on local data clustered on financial variables. Divide, conquer, cluster and predict. We used shapely values and its mathematics to know how we can explain the predicitons that came. Mutliple visits between the departments and syncing with teams across the team, along with the team, I finally cracked the problem and our footprints are there on every credit card sold till date.