Socio-Economic Statistics Research, 2025, 6(2); doi: 10.38007/SESR.2025.060201.
Zhiqiong Zou
Jingchu University of Technology, Jingmen, Hubei, 448000, China
Internet consumer finance has developed rapidly since 2013 by providing small loans to meet users' credit needs. However, due to the imperfect credit reporting system, it is difficult for the platform to capture customer information, which makes it difficult to identify and quantify credit risks. In 2022, the industry's average non-performing loan ratio reached 3%, and risk control is particularly important. Traditional logistic regression algorithms exhibit issues of low running speed and decreased predictive ability when processing high-dimensional data. This paper applies causal inference theory to screen variables, selects LightGBM algorithm, and improves Heckman two-step method to build a risk control prediction model of Internet consumer finance based on Heckman two-step method with sample selection bias correction, and selects KS and AUC indicators to evaluate the model effect. Based on the customer credit data of a consumer finance company, this paper analyzes the characteristics and credit risks of Internet consumer finance, and builds a whole process big data risk control model. Research has shown that risk control models constructed using Heckman two-step method and LightGBM algorithm have improved KS and AUC indicators, effectively quantifying borrower credit risk, predicting and controlling default probability, and thereby reducing credit risk. The research results provide new ideas and references for the credit risk control of Internet consumer finance.
Internet consumer finance; Credit risk; LightGBM algorithm; Heckman two-step method; Risk control model
Zhiqiong Zou. Research on Credit Risk Quantitative Model of Internet Consumer Finance Based on LightGBM Algorithm. Socio-Economic Statistics Research (2025), Vol. 6, Issue 2: 1-10. https://doi.org/10.38007/SESR.2025.060201.
[1] Zhao F. Research on Intelligent Scheduling and Collaborative Optimization Strategies for Multi Source Heterogeneous Workflow Systems Based on Big Data Technology[C]//International Conference on Innovative Computing. Springer, Singapore, 2025: 267-274.
[2] Guo Y. Technology Application and Focus in Financial Investment Banking[J]. European Journal of Business, Economics & Management, 2025, 1(1): 104-110.
[3] Zhang Y. Research on the Application and Optimization of Multi Dimensional Data Model Based on Kylin in Enterprise Technology Management[C]//International Conference on Innovative Computing. Springer, Singapore, 2025: 234-241.
[4] Zhang M. Design of Object Segmentation and Feature Extraction Based On Deep Learning for AFM Image Processing and Analysis System[J]. Procedia Computer Science, 2025, 262: 982-991.
[5] Chen A. Research on the Demand Hierarchy of E-Commerce Products Based on Text Mining and the IPA-KANO Model[J]. Pinnacle Academic Press Proceedings Series, 2025, 2: 153-159.
[6] Lai L. Research and Design of Data Security Risk Assessment Model Based on Fusion of Deep Learning and Analytic Hierarchy Process (AHP)[J]. Procedia Computer Science, 2025, 262: 747-756.
[7] Xiu L. Research on the Design of Modern Distance Education System Based on Agent Technology[J]. Pinnacle Academic Press Proceedings Series, 2025, 2: 160-169.
[8] Yan J. Research on Application of Big Data Mining and Analysis in Image Processing[J]. Pinnacle Academic Press Proceedings Series, 2025, 2: 130-136.
[9] Yan J. Analysis and Application of Spark Fast Data Recommendation Algorithm Based on Hadoop Platform[C]//2025 Asia-Europe Conference on Cybersecurity, Internet of Things and Soft Computing (CITSC). IEEE, 2025: 872-876.
[10] Pan Y. Research on Cloud Storage Data Access Control Based on the CP-ABE Algorithm[J]. Pinnacle Academic Press Proceedings Series, 2025, 2: 122-129.
[11] Pan Y. Research on the Design of a Real-Time E-Commerce Recommendation System Based on Spark in the Context of Big Data[C]//2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE). IEEE, 2025: 1028-1033.
[12] Chen A. Research on Intelligent Code Search Technology Based on Deep Learning[J]. Pinnacle Academic Press Proceedings Series, 2025, 2: 137-143.
[13] Cai Y. Design and Implementation of a Cross Platform i0s Application Development Framework Based on YAI Configuration Files[J]. Procedia Computer Science, 2025, 262: 939-947.
[14] Wei Z. Design and Implementation of Financial Derivatives Trading Platform Based on Blockchain Technology[J]. Financial Economics Insights, 2025, 2(1): 29-35.
[15] Zhang X. Application of Real Time Machine Learning Models in Financial Fraud Identification[J]. European Journal of Business, Economics & Management, 2025, 1(2): 1-7.
[16] Huang J. Digital Technologies Enabling Rural Revitalization: The Practice of AI and BIM in the Adaptive Reuse of Historic Buildings[J]. International Journal of Architectural Engineering and Design, 2025, 2(1): 1-8.
[17] Wang C. Application of Data Analysis in Bank Mortgage Loan Risk Assessment[J]. Financial Economics Insights, 2025, 2(1): 23-28.
[18] Yang, D., & Liu, X. (2025). Research on Large-Scale Data Processing and Dynamic Content Optimization Algorithm Based On Reinforcement Learning. Procedia Computer Science, 261, 458-466.
[19] Cui N. Safety Countermeasures and Accident Prevention Measures in Traffic Design[J]. International Journal of Engineering Advances, 2025, 2(2): 37-42.
[20] Li B. The Promoting Role of Data Analysis Technology in Sustainable Energy[J]. European Journal of Engineering and Technologies, 2025, 1(1): 32-38.
[21] Zhao F. Design and Implementation of Data Quality Optimization Strategy and Multi-Level Data Governance Platform based on Big Data Processing and Dynamic Rule Extension[J]. Procedia Computer Science, 2025, 262: 638-647.
[22] Guo Y. Research on Investment Bank Risk Monitoring and Early Warning Model Combining Factor Analysis and Artificial Neural Network[J]. Procedia Computer Science, 2025, 262: 878-886.
[23] Zhang J. Design and Implementation of a Fuzzy Testing Framework for Hyper-V Virtualization Engine Based on Nested Virtualization and Coverage Orientation[C]//The International Conference on Cyber Security Intelligence and Analytics. Springer, Cham, 2025: 176-183.
[24] Cui N. Research and Application of Traffic Simulation Optimization Algorithm Based on Improved Road Network Topology Structure[C]//The International Conference on Cyber Security Intelligence and Analytics. Springer, Cham, 2025: 156-163.
[25] Yang D, Liu X. Collaborative Algorithm for User Trust and Data Security Based on Blockchain and Machine Learning[J]. Procedia Computer Science, 2025, 262: 757-765.
[26] Wei Z. Construction of Supply Chain Finance Game Model Based on Blockchain Technology and Nash Equilibrium Analysis[J]. Procedia Computer Science, 2025, 262: 901-908.
[27] Zhang X. Optimization and Implementation of Time Series Dimensionality Reduction Anti-fraud Model Integrating PCA and LSTM under the Federated Learning Framework[J]. Procedia Computer Science, 2025, 262: 992-1001.
[28] Liu Z. Design of a Full-Process Transaction Monitoring and Risk Feedback System for DevOps Based on Microservices Architecture and Machine Learning Methods[J]. Procedia Computer Science, 2025, 262: 948-954.
[29] Jiang Y. Research on the Optimization of Digital Object System by Integrating Metadata Standard and Machine Learning Algorithm[J]. Procedia Computer Science, 2025, 262: 849-858.
[30] Chen, H., Wang, Z., & Han, A. (2024). Guiding Ultrasound Breast Tumor Classification with Human-Specified Regions of Interest: A Differentiable Class Activation Map Approach. In 2024 IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Symposium (UFFC-JS) (pp. 1-4). IEEE.
[31] An C. Research on High Frequency Financial Transaction Data Modeling and Cloud Computing Implementation Based on SSA-GA-BP Model[J]. Procedia Computer Science, 2025, 262: 859-867.
[32] Chen, H., Zuo, J., Zhu, Y., Kabir, M. R., & Han, A. (2024). Polar-Space Frequency-Domain Filtering for Improved Pulse-echo Speed of Sound Imaging with Convex Probes. In 2024 IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Symposium (UFFC-JS) (pp. 1-4). IEEE.
[33] Chen, H., Yang, Y., & Shao, C. (2021). Multi-task learning for data-efficient spatiotemporal modeling of tool surface progression in ultrasonic metal welding. Journal of Manufacturing Systems, 58, 306-315.
[34] Varatharajah, Y., Chen, H., Trotter, A., & Iyer, R. K. (2020). A Dynamic Human-in-the-loop Recommender System for Evidence-based Clinical Staging of COVID-19. In HealthRecSys@ RecSys (pp. 21-22).
[35] Pan H. Design and Implementation of a Cloud Computing Privacy-Preserving Machine Learning Model for Multi-Key Fully Homomorphic Encryption[J]. Procedia Computer Science, 2025, 262: 887-893.
[36] Wang C. Research on Modeling and Forecasting High-Frequency Financial Data Based on Histogram Time Series[J]. Procedia Computer Science, 2025, 262: 894-900.
[37] Wei X. Research on Preprocessing Techniques for Software Defect Prediction Dataset Based on Hybrid Category Balance and Synthetic Sampling Algorithm[J]. Procedia Computer Science, 2025, 262: 840-848.
[38] Zhou, Yixin. "Design and Implementation of Online Log Anomaly Detection Model based on Text CM and Hierarchical Attention Mechanism." In 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), pp. 1-6. IEEE, 2025.
[39] Jiang, Yixian. "Research on Random Sampling Data Diffusion Technique in the Construction of Digital Object System Test Dataset." In 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), pp. 1-6. IEEE, 2025.
[40] Hao, Linfeng. "Research on Automatic Driving Road Object Detection Algorithm Integrating Multi Scale Detection and Boundary Box Regression Optimization." In 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), pp. 1-6. IEEE, 2025.
[41] Shen, D. (2025). AI-Driven Clinical Decision Support Optimizes Treatment Accuracy for Mental Illness. Journal of Medicine and Life Sciences, 1(3), 81-87.
[42] Tu, Xinran. "Feature Selection and Classification of Electronic Product Fault Short Text by Integrating TF-IDF and Wor D2vec." In 2025 4th International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), pp. 1-6. IEEE, 2025.
[43] Chen, X. (2025). Research on the Application of Multilingual Natural Language Processing Technology in Smart Home Systems. Journal of Computer, Signal, and System Research, 2(5), 8-14.
[44] Chen, H., Zhu, Y., Zuo, J., Kabir, M. R., & Han, A. (2024). TranSpeed: Transformer-based Generative Adversarial Network for Speed-of-sound Reconstruction in Pulse-echo Mode. In 2024 IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Symposium (UFFC-JS) (pp. 1-4). IEEE.
[45] Wang B. Application of Efficient Load Test Strategies in Infrastructure[J]. Journal of Computer, Signal, and System Research, 2025, 2(4): 69-75.