Utilizing Prediction Intervals for Unsupervised Detection of Fraudulent Transactions: A Case Study
Keywords:Fraud Detection, Prediction Intervals, Ordinary Least Squares, Random Forest, Dropout Neural Network, Unsupervised Machine Learning
Money laundering operations have a high negative impact on the growth of a country’s national economy. As all financial sectors are increasingly being integrated, it is vital to implement effective technological measures to address these fraudulent operations. Machine learning methods are widely used to classify an incoming transaction as fraudulent or non-fraudulent by analyzing the behaviour of past transactions. Unsupervised machine learning methods do not require label information on past transactions, and a classification is made solely based on the distribution of the transaction. This research presents three unsupervised classification methods: ordinary least squares regression-based (OLS) fraud detection, random forest-based (RF) fraud detection and dropout neural network-based (DNN) fraud detection. For each method, the goal is to classify an incoming transaction amount as fraudulent or non-fraudulent. The novelty in the proposed approach is the application of prediction interval calculation for automatically validating incoming transactions. The three methods are applied to a real-world dataset of credit card transactions. The fraud labels available for the dataset are removed during the model training phase but are later used to evaluate the performance of the final predictions. The performance of the proposed methods is further compared with two other unsupervised state-of-the-art methods. Based on the experimental results, the OLS and RF methods show the best performance in predicting the correct label of a transaction, while the DNN method is the most robust method for detecting fraudulent transactions. This novel concept of calculating prediction intervals for validating an incoming transaction introduces a new direction for unsupervised fraud detection. Since fraud labels on past transactions are not required for training, the proposed methods can be applied in an online setting to different areas, such as detecting money laundering activities, telecommunication fraud and intrusion detection.
M. Zanin, M. Romance, S. Moral, and R. Criado, “Credit card fraud detection through parenclitic network analysis,” Complexity, Vol. 2018, 2018.
E. W. Ngai, Y. Hu, Y. H. Wong, Y. Chen, and X. Sun, “The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature,” Decision support systems, Vol. 50, No. 3, pp. 559-569, 2011.
J. West and M. Bhattacharya, “Intelligent financial fraud detection: A comprehensive review,” Computers & security, Vol. 57, pp. 47-66, 2016.
M. Zareapoor, K. Seeja, and M. A. Alam, “Analysis on credit card fraud detection techniques: Based on certain design criteria,” International journal of computer applications, Vol. 52, No. 3, 2012.
V. Zaslavsky and A. Strizhak, “Credit card fraud detection using self-organizing maps,” Information and Security, Vol. 18, pp. 48, 2006.
C. Liu, Y. Chan, S. H. Alam Kazmi, and H. Fu, “Financial fraud detection model: Based on random forest,” International journal of economics and finance, Vol. 7, No. 7, 2015.
E. M. Carneiro, L. A. V. Dias, A. M. Da Cunha, and L. F. S. Mialaret, “Cluster analysis and artificial neural networks: A case study in credit card fraud detection,” in 2015 12th international conference on information technology-new generations, pp. 122-126, 2015.
D. Excell, “Bayesian inference–the future of online fraud protection,” Computer Fraud & Security, Vol. 2012, No. 2, pp. 8-11, 2012.
D. J. Hand, “Discrimination and classification,” Wiley Series in Probability and Mathematical Statistics, 1981.
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit card fraud detection: A realistic modeling and a novel learning strategy,” IEEE transactions on neural networks and learning systems, Vol. 29, No. 8, pp. 3784-3797, 2017.
Y. Sahin and E. Duman, “Detecting credit card fraud by ANN and logistic regression,” in 2011 international symposium on innovations in intelligent systems and applications, pp. 315-319, 2011.
S. V. Suryanarayana, G. Balaji, and G. V. Rao, “Machine learning approaches for credit card fraud detection,” Int. J. Eng. Technol, Vol. 7, No. 2, pp. 917–920, 2018.
A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, “Feature engineering strategies for credit card fraud detection,” Expert Systems with Applications, Vol. 51, pp. 134-142, 2016.
A. Bach, “Boltzmann’s probability distribution of 1877,” Archive for History of Exact Sciences, Vol. 41, No. 1, pp. 1-40, 1990.
T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media, 2009.
C. Elkan, “The foundations of cost-sensitive learning,” in International joint conference on artificial intelligence, Vol. 17, pp. 973–978, 2001.
A. Srivastava, A. Kundu, S. Sural, and A. Majumdar, “Credit card fraud detection using hidden markov model,” IEEE Transactions on dependable and secure computing, Vol. 5, No. 1, pp. 37-48, 2008.
M. Rezapour, “Anomaly detection using unsupervised methods: Credit card fraud case study,” Int J Adv Comput Sci Appl, Vol. 10, No. 11, 2019.
R. J. Bolton, D. J. Hand, et al., “Unsupervised profiling methods for fraud detection,” Credit scoring and credit control VII, pp. 235-255, 2001.
M. R. Lepoivre, C. O. Avanzini, G. Bignon, L. Legendre, and A. K. Piwele, “Credit card fraud detection with unsupervised algorithms,” Journal of Advances in Information Technology, Vol. 7, No. 1, 2016.
F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth IEEE international conference on data mining, pp. 413-422, 2008.
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: Identifying density-based local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp. 93-104, 2000.
H. John and S. Naaz, “Credit card fraud detection using local outlier factor and isolation forest,” Int. J. Comput. Sci. Eng, Vol. 7, No. 4, pp. 1060-1064, 2019.
Z. Cheng, C. Zou, and J. Dong, “Outlier detection using isolation forest and local outlier factor,” in Proceedings of the conference on research in adaptive and convergent systems, pp. 161-168, 2019.
A. K. Shaikh and A. Nazir, “A novel dynamic approach to identifying suspicious customers in money transactions,” International Journal of Business Intelligence and Data Mining, Vol. 17, No. 2, pp. 143-158, 2018.
K. Shaikh and A. Nazir, “A model for identifying relationships of suspicious customers in money laundering using social network functions,” in Proceedings of the world congress on engineering, Vol. 1, pp. 4-7, 2018.
R. Krzysztofowicz and K. S. Kelly, “Hydrologic uncertainty processor for probabilistic river stage forecasting,” Water resources research, Vol. 36, No. 11, pp. 3265-3277, 2000.
K. Beven and A. Binley, “The future of distributed models: Model calibration and uncertainty prediction,” Hydrological processes, Vol. 6, No. 3, pp. 279-298, 1992.
S. Maskey, V. Guinot, and R. K. Price, “Treatment of precipitation uncertainty in rainfall-runoff modelling: A fuzzy set approach,” Advances in water resources, Vol. 27, No. 9, pp. 889-898, 2004.
S. Kumar and A. Srivastava, “Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection,” 2012.
B. Lu and J. Hardin, “A unified framework for random forest prediction error estimation,” Journal of Machine Learning Research, Vol. 22, No. 8, pp. 1-41, 2021.
J. J. Faraway, Practical regression and ANOVA using R., Vol. 168, Citeseer, 2002.
R. A. Stine, “Bootstrap prediction intervals for regression,” Journal of the American Statistical Association, Vol. 80, No. 392, pp. 1026-1031, 1985.
D. J. Olive, “Prediction intervals for regression models,” Computational statistics & data analysis, Vol. 51, No. 6, pp. 3115-3122, 2007.
H. Zhang, J. Zimmerman, D. Nettleton, and D. J. Nordman, “Random forest prediction intervals,” The American Statistician, 2019.
N. Meinshausen and G. Ridgeway, “Quantile regression forests.” Journal of Machine Learning Research, Vol. 7, No. 6, 2006.
B. Lu and J. Hardin, “Constructing prediction intervals for random forests,” PhD thesis, Pomona College, 2017.
J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive inference for regression,” Journal of the American Statistical Association, Vol. 113, No. 523, pp. 1094-1111, 2018.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The journal of machine learning research, Vol. 15, No. 1, pp. 1929-1958, 2014.
Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in International conference on machine learning, pp. 1050-1059, 2016.
A. Damianou and N. D. Lawrence, “Deep gaussian processes,” in Artificial intelligence and statistics, pp. 207-215, 2013.
M. Rosenblatt, “A central limit theorem and a strong mixing condition,” Proceedings of the National Academy of Sciences of the United States of America, Vol. 42, No. 1, pp. 43, 1956.
L. Breiman, “Random forests,” Machine learning, Vol. 45, No. 1, pp. 5-32, 2001.
A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,” in Ensemble machine learning, Springer, 2012, pp. 157-175.
S. Sharma and S. Sharma, “Activation functions in neural networks,” Towards Data Science, Vol. 6, No. 12, pp. 310-316, 2017.
L. J. Latecki, A. Lazarevic, and D. Pokrajac, “Outlier detection with kernel density functions,” in International workshop on machine learning and data mining in pattern recognition, pp. 61-75, 2007.
K. Yu and M. Jones, “Local linear quantile regression,” Journal of the American statistical Association, Vol. 93, No. 441, pp. 228-237, 1998.
M. Rout et al., “Analysis and comparison of credit card fraud detection using machine learning,” in Advances in electronics, communication and computing, Springer, pp. 33-40, 2021.