Preview

Finance: Theory and Practice

Advanced search

Comparative Analysis of Machine learning Methods to Identify signs of suspicious Transactions of Credit Institutions and Their Clients

https://doi.org/10.26794/2587-5671-2020-25-5-186-199

Abstract

In the field of financial monitoring, it is necessary to promptly obtain objective assessments of economic entities (in particular, credit institutions) for effective decision-making. Automation of the process of identifying unscrupulous credit institutions based on machine learning methods will allow regulatory authorities to quickly identify and suppress illegal activities. The aim of the research is to substantiate the possibilities of using machine learning methods and algorithms for the automatic identification of unscrupulous credit institutions. It is required to select a mathematical toolkit for analyzing data on credit institutions, which allows tracking the involvement of a bank in money laundering processes. The paper provides a comparative analysis of the results of processing data on the activities of credit institutions using classification methods — logistic regression, decision trees. The author applies support vector machine and neural network methods, Bayesian networks (Two-Class Bayes Point Machine), and anomaly search — an algorithm of a One-Class Support Vector Machine and a PCA-Based Anomaly Detection algorithm. The study presents the results of solving the problem of classifying credit institutions in terms of possible involvement in money laundering processes, the results of analyzing data on the activities of credit institutions by methods of detecting anomalies. A comparative analysis of the results obtained using various modern algorithms for the classification and search for anomalies is carried out. The author concluded that the PCA-Based Anomaly Detection algorithm showed more accurate results compared to the One-Class Support Vector Machine algorithm. Of the considered classification algorithms, the most accurate results were shown by the Two-Class Boosted Decision Tree (AdaBoost) algorithm. The research results can be used by the Bank of Russia and Rosfinmonitoring to automate the identification of unscrupulous credit institutions

About the Author

Yu. M. Beketnova
Financial University
Russian Federation

Yuliya M. Beketnova — Cand. Sci. (Eng.), Assoc. Prof., Information Security Department

Moscow



References

1. Kurkina E.P., Shuvalova D.G. Risk assessment: Expert method. Problemy nauki. 2017;(1):63–69. (In Russ.).

2. Zakharyan A. G. Expert assessment of the complex sustainability of a commercial bank. Finansovye issledovaniya. 2004;(9):14–19. (In Russ.).

3. Beketnova Yu.M., Krylov G.O., Denisenko A.S. The Problems of management and decision support in the government authorities on the example of the Rosfinmonitoring. Informatizatsiya i svyaz’ = Informatization and Communication. 2018;(2):82–88. (In Russ.).

4. Klochko A.N., Logvinenko N.I., Kobzeva T.A., Kiselyova E.I. Legalizing proceeds from crime through the banking system. Kriminologicheskii zhurnal Baikal’skogo gosudarstvennogo universiteta ekonomiki i prava = Criminology Journal of Baikal National University of Economics and Law. 2016;10(1):194–204. (In Russ.). DOI: 10.17150/1996–7756.2016.10(1).194–204

5. Kononova N.P., Patlasov O. Yu., Kononov E. D. The risk-focused approach in the sphere of counteraction to laundering of the income and to financing terrorism. Nauka o cheloveke: gumanitarnye issledovaniya = The Science of Person: Humanitarian Researches. 2016;(2):183–189. (In Russ.). DOI: 10.17238/issn1998–5320.2016.24.183

6. Kuznetsova E.I., Burykin D.V., Masterova S.A. Risk-oriented internal control of credit institutions in the field of combating the legalization of proceeds from crime. Vestnik ekonomicheskoi bezopasnosti = Vestnik of Economic Security. 2017;(2):299–302. (In Russ.).

7. Pryakhin G.N., Ameleshin K.A. Improvement of methods of countering the legalization of criminal income and financing terrorism in the banking system. Vestnik Chelyabinskogo gosudarstvennogo universiteta = CSU Bulletin. 2019;(3):28–34. (In Russ.). DOI: 10.24411/1994–2796–2019–10304

8. Filatova I.V. Application of a risk-based approach to counteract the legalization (laundering) of proceeds from crime. Vestnik Moskovskogo universiteta MVD Rossii = Vestnik of Moscow University of the Ministry of Internal Affairs of Russia. 2019;(1):233–236. (In Russ.). DOI: 10.24411/2073–0454–2019–10055

9. Pavlidis N.G., Tasoulis D.K., Adams N.M., Hand D.J. Adaptive consumer credit classification. Journal of the Operational Research Society. 2012;63(12):1645–1654. DOI: 10.1057/jors.2012.15

10. Yap B. W., Ong S. H., Husain N. H.M. Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Systems with Applications. 2011;38(10):13274–13283. DOI: 10.1016/j.eswa.2011.04.147

11. Khemais Z., Nesrine D., Mohamed M. Credit scoring and default risk prediction: A comparative study between discriminant analysis & logistic regression. International Journal of Economics and Finance. 2016;8(4):39. DOI: 10.5539/ijef.v8n4p39

12. Li Z., Tian Y., Li K., Zhou F., Yang W. Reject inference in credit scoring using semi-supervised support vector machines. Expert Systems with Applications. 2017;74:105–114. DOI: 10.1016/j.eswa.2017.01.011

13. Louzada F., Anacleto-Junior O., Candolo C., Mazucheli J. Poly-bagging predictors for classification modelling for credit scoring. Expert Systems with Applications. 2011;38(10):12717–12720. DOI: 10.1016/j.eswa.2011.04.059

14. Siers M.J., Islam M.Z. Class imbalance and cost-sensitive decision trees: A unified survey based on a core similarity. ACM Transactions on Knowledge Discovery from Data. 2021;15(1):4. DOI: 10.1145/3415156

15. Bunkhumpornpat C., Sinapiromsaran K. Density-based majority under-sampling technique. Knowledge and Information Systems. 2017;50(3):827–850. DOI: 10.1007/s10115–016–0957–5

16. Devi D., Biswas S., Purkayastha B. A cost-sensitive weighted random forest technique for credit card fraud detection. In: The 10th Int. conf. on computing, communication and networking technologies (ICCCNT). (Kanpur, July 6–8, 2019). New York: IEEE; 2019. DOI: 10.1109/ICCCNT45670.2019.8944885

17. Zhang S. Multiple-scale cost sensitive decision tree learning. World Wide Web. 2018;21(6):1787–1800. DOI: 10.1007/s11280–018–0619–5

18. Zhu B., Baesens B., vanden Broucke S.K.L.M. An empirical comparison of techniques for the class imbalance problem in churn prediction. Information Sciences. 2017;408:84–99. DOI: 10.1016/j.ins.2017.04.015

19. Hurst W., Montanez C.A.C., Shone N. Towards an approach for fuel poverty detection from gas smart meter data using decision tree learning. In: Proc. 2020 3rd Int. conf. on information management and management science (IMMS). (London, Aug. 7–9, 2020). New York: Association for Computing Machinery; 2020:23–28. DOI: 10.1145/3416028.3416034

20. Al-Yarimi F.A.M., Munassar N. M.A., Al-Wesabi F. N. Electrocardiogram stream level correlated patterns as features to classify heartbeats for arrhythmia prediction. Data Technologies and Applications. 2020;54(5):685–701. DOI: 10.1108/DTA-03–2020–0076

21. Malhotra D.K., Malhotra K., Malhotra R. Evaluating consumer loans using machine learning techniques. In: Lawrence, K.D., Pai D.R., eds. Applications of Management Science. Bingley: Emerald Publishing Ltd; 2020;20:59–69. DOI: 10.1108/S 0276–897620200000020004

22. Yang Y., Liu C., Liu N. Credit card fraud detection based on CSat-related AdaBoost. In: Proc. 2019 8th Int. conf. on computing and pattern recognition (ICCPR’19). (Beijing, Oct. 23–25, 2019). New York: Association for Computing Machinery; 2019:420–425. DOI: 10.1145/3373509.3373548

23. Tran P.H., Tran K.P., Huong T.T., Heuchenne C., Tran P.H., Le T.M.H. Real time data-driven approaches for credit card fraud detection. In: Proc. 2018 Int. conf. on e-business and applications (ICEBA 2018). (Da Nang, Feb. 23–25, 2018). New York: Association for Computing Machinery; 2018:6–9. DOI: 10.1145/3194188.3194196

24. Wu W.-W. Improving classification accuracy and causal knowledge for better credit decisions. International Journal of Neural Systems. 2011;21(4):297–309. DOI: 10.1142/S 0129065711002845

25. Zhu H., Beling P.A., Overstreet G.A. A Bayesian framework for the combination of classifier outputs. Journal of the Operational Research Society. 2002;53(7):719–727. DOI: 10.1057/palgrave.jors.2601262

26. Marqués A. I., García V., Sánchez J. S. Two-level classifier ensembles for credit risk assessment. Expert Systems with Applications. 2012;39(12):10916–10922. DOI: 10.1016/j.eswa.2012.03.033

27. Vukovic S., Delibasic B., Uzelac A., Suknovic M. A case-based reasoning model that uses preference theory functions for credit scoring. Expert Systems with Applications. 2012;39(9):8389–8395. DOI: 10.1016/j.eswa.2012.01.181

28. Akkoç S. An empirical comparison of conventional techniques, neural networks and the three-stage hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish credit card data. European Journal of Operational Research. 2012;222(1):168–178. DOI: 10.1016/j.ejor.2012.04.009

29. Turchetti Maia T., Pádua Braga A., de Carvalho A. F. Hybrid classification algorithms based on boosting and support vector machines. Kybernetes. 2008;37(9/10):1469–1491. DOI: 10.1108/03684920810907814

30. Wu Y., Qi S., Hu F., Ma S., Mao W., Li W. Recognizing activities of the elderly using wearable sensors: A comparison of ensemble algorithms based on boosting. Sensor Review. 2019;39(6):743–751. DOI: 10.1108/SR-11–2018–0309

31. Faleh R., Gomri S., Othman M., Aguir K., Kachouri A. Enhancing WO3 gas sensor selectivity using a set of pollutant detection classifiers. Sensor Review. 2018;38(1):65–73. DOI:10.1108/SR-12–2016–0273

32. Lee S.-C., Faloutsos C., Chae D.-K., Kim S.-W. On detecting frauds in comparison-shopping services. In: Proc. 26th Int. conf. on world wide web companion (WWW’17 Companion). (Perth, Apr. 3–7, 2017). Geneva: IWWWC Steering Committee; 2017:811–812. DOI: 10.1145/3041021.3054219

33. Sohony I., Pratap R., Nambiar U. Ensemble learning for credit card fraud detection. In: Proc. ACM India joint int. conf. on data science and management of data (CoDS-COMAD’18). (Goa, Jan. 11–13, 2018). New York: Association for Computing Machinery; 2018:289–294. DOI: 10.1145/3152494.3156815

34. Lucas Y., Portier P.-E., Laporte L., Calabretto S., Caelen O., He-Guelton L., Granitzer M. Multiple perspectives HMM-based feature engineering for credit card fraud detection. In: Proc. 34th ACM/SIGAPP Symp. on applied computing (SAC’19). (Limassol, Apr. 8–12, 2019). New York: Association for Computing Machinery; 2019:1359–1361. DOI: 10.1145/3297280.3297586

35. Li Q., Xie Y. A behavior-cluster based imbalanced classification method for credit card fraud detection. In: Proc. 2019 2nd Int. conf. on data science and information technology (DSIT 2019). (Seoul, July 19–21, 2019). New York: Association for Computing Machinery; 2019:134–139. DOI: 10.1145/3352411.3352433

36. Ray S., Wright A. Detecting anomalies in alert firing within clinical decision support systems using Anomaly/Outlier Detection Techniques. In: Proc. 7th ACM Int. conf. on bioinformatics, computational biology, and health informatics (BCB’16). (Seattle, Oct. 2–5, 2016). New York: Association for Computing Machinery; 2016:185–190. DOI: 10.1145/2975167.2975186

37. Geiger B.C., Kubin G. Relative information loss in the PCA. In: Proc. IEEE information theory workshop (ITW). (Lausanne, Sept. 3–7, 2012). New York: IEEE; 2012:562–566. DOI: 10.1109/ITW.2012.6404738

38. Howard M.C. A review of exploratory factor analysis decisions and overview of current practices: What we are doing and how can we improve? International Journal of Human-Computer Interaction. 2016;32(1):51–62. DOI: 10.1080/10447318.2015.1087664

39. Khan H., Srivastav A., Mishra A.K. Use of classification algorithms in health care. In: Tanwar P., Jain V., Liu C.-M., Goyal V., eds. Big data analytics and intelligence: A perspective for health care. Bingley: Emerald Publishing Ltd; 2020:31–54. DOI: 10.1108/978–1–83909–099–820201007

40. Deepa B.G., Senthil S. Constructive effect of ranking optimal features using Random Forest, SupportVector Machine and Naïve Bayes for breast cancer diagnosis. In: Tanwar P., Jain V., Liu C.-M., Goyal V., eds. Big data analytics and intelligence: A perspective for health care. Bingley: Emerald Publishing Ltd; 2020:189–202. DOI: 10.1108/978–1–83909–099–820201014

41. Ramaswami G., Susnjak T., Mathrani A., Lim J., Garcia P. Using educational data mining techniques to increase the prediction accuracy of student academic performance. Information and Learning Sciences. 2019;120(7/8):451–467. DOI: 10.1108/ILS-03–2019–0017

42. Eskindarov M.A., Soloviev V.I., eds. Paradigms of the digital economy: Artificial intelligence technologies in finance and FinTech. Moscow: Cogito-Center; 2019. 325 p. (In Russ.).

43. Lee J.-S. AUC 4.5: AUC-based c4.5 decision tree algorithm for imbalanced data classification. IEEE Access. 2019;7:106034–106042. DOI: 10.1109/ACCESS.2019.2931865

44. Zhang Q., Wang J., Lu A., Wang S., Ma J. An improved SMO algorithm for financial credit risk assessment — Evidence from China’s banking. Neurocomputing. 2018;272:314–325. DOI: 10.1016/j.neucom.2017.07.002


Review

For citations:


Beketnova Yu.M. Comparative Analysis of Machine learning Methods to Identify signs of suspicious Transactions of Credit Institutions and Their Clients. Finance: Theory and Practice. 2021;25(5):186-199. https://doi.org/10.26794/2587-5671-2020-25-5-186-199

Views: 720


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2587-5671 (Print)
ISSN 2587-7089 (Online)