AI-Driven Privacy Audit Automation and Data Provenance Tracking in Large-Scale Systems

Author(s)

Chen-Wei Chang

Corresponding Author:

Chen-Wei Chang

Affiliation(s)

Independent Researcher, WA 98011, USA

Download PDF

Download: 8

Abstract

With the rapid development of mobile Internet, big data, supercomputing, sensor networks, brain science and other technologies, machine learning has entered a period of accelerated development to promote economic and social progress. However, its entire life cycle (data preprocessing, model training, model reasoning) is facing increasingly complex data security and privacy challenges, which is difficult for traditional protection technologies to cope with. This study proposes the following innovative solutions to address this issue: in the data preprocessing stage, a secure feature extraction scheme based on a single cloud server called SeiFS is developed, which integrates cryptographic primitives such as obfuscation circuits, unintentional transmission, and secret sharing to achieve end-to-end privacy protection; In the model training phase, two privacy preserving distributed training schemes (PEFL and PEFLimd) are designed, which combine additive homomorphic encryption and robust aggregation strategy to automatically filter poisoning gradients and protect gradient privacy[1-3]; In the model inference stage, an efficient convolution evaluation scheme based on homomorphic encryption (supporting large kernel and large step convolution) and a CryptoGT scheme for Transformer graph neural networks are proposed to reduce computational overhead and solve the "neighbor explosion" problem through fast convolution algorithms and hierarchical evaluation protocols. All schemes have passed formal security proofs and experimental verification. SeiFS achieves efficient security feature extraction in cloud computing scenarios, PEFL series schemes improve training efficiency while filtering poisoning attacks[4-5], CNN inference schemes significantly reduce homomorphic computational complexity, and CryptoGT effectively supports security evaluation of complex nonlinear functions, solving the scalability challenges of modern neural architectures and ensuring privacy as a whole. In the future, it is necessary to break through key issues such as lightweight and verifiable security feature extraction, universal security attack and defense system, non leakage model inference technology that preserves accuracy, and full lifecycle privacy protection for new generation models (such as big language models and video generation systems)[6-7].

Keywords

Machine Learning, Data Security, Privacy Protection, Distributed Training, Homomorphic Encryption

Cite This Paper

Chen-Wei Chang. AI-Driven Privacy Audit Automation and Data Provenance Tracking in Large-Scale Systems. International Journal of Business Management and Economics and Trade (2025), Vol. 6, Issue 1: 126-137. https://doi.org/10.38007/IJBMET.2025.060113.

References

[1] Zhu, Z. (2025). Application of Database Performance Optimization Technology in Large-Scale AI Infrastructure. European Journal of Engineering and Technologies, 1(1), 60-67.

[2] Huang, W., Zhang, Z., Zhao, W., Peng, J., Xu, W., Liao, Y., ... & Wang, Z. (2025). Auditing privacy budget of differentially private neural network models. Neurocomputing, 614, 128756.

[3] An, C. (2025). Exploration of Data-Driven Capital Market Investment Decision Support Model. European Journal of Business, Economics & Management, 1(3), 31-37.

[4] Pan Y. Research on the Design of a Real-Time E-Commerce Recommendation System Based on Spark in the Context of Big Data[C]//2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE). IEEE, 2025: 1028-1033.

[5] Lai L. Data-Driven Credit Risk Assessment and Optimization Strategy Exploration[J]. European Journal of Business, Economics & Management, 2025, 1(3): 24-30.

[6] Ullah, F., Pun, C. M., Mohmand, M. I., Mahendran, R. K., Khan, A. A., Alhammad, S. M., ... & Farouk, A. (2025). Privacy-aware secure data auditing for cloud-based intelligence of things environment. IEEE Internet of Things Journal.

[7] Li, W. (2025). Discussion on Using Blockchain Technology to Improve Audit Efficiency and Financial Transparency. Economics and Management Innovation, 2(4), 72-79.

[8] Tang X, Wu X, Bao W. Intelligent Prediction-Inventory-Scheduling Closed-Loop Nearshore Supply Chain Decision System[J]. Advances in Management and Intelligent Technologies, 2025, 1(4)

[9] Sheng C. Research on AI-Driven Financial Audit Efficiency Improvement and Financial Report Accuracy[J]. European Journal of Business, Economics & Management, 2025, 1(2): 55-61.

[10] Zhang, Q., Qian, S., Cui, J., Zhong, H., Wang, F., & He, D. (2025). Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage. IEEE Transactions on Computers.

[11] Wu X, Bao W. Research on the Design of a Blockchain Logistics Information Platform Based on Reputation Proof Consensus Algorithm[J]. Procedia Computer Science, 2025, 262: 973-981.

[12] Yang D, Liu X. Collaborative Algorithm for User Trust and Data Security Based on Blockchain and Machine Learning[J]. Procedia Computer Science, 2025, 262: 757-765.

[13] Wei, X. (2025). Practical Application of Data Analysis Technology in Startup Company Investment Evaluation. Economics and Management Innovation, 2(4), 33-38.

[14] Fallatah, E. (2025). Ensuring Compliance: Data Privacy Audits Under Global Privacy Regulations. International Journal of Applied Economics, Finance and Accounting, 22(2), 133-144.

[15] Wang, C. (2025). Exploration of Optimization Paths Based on Data Modeling in Financial Investment Decision-Making. European Journal of Business, Economics & Management, 1(3), 17-23.