Welcome to Scholar Publishing Group

Socio-Economic Statistics Research, 2025, 6(2); doi: 10.38007/SESR.2025.060209.

Research on Sentiment Analysis Based on Multi-source Data Fusion and Pre-trained Model Optimization in Quantitative Finance

Author(s)

Jiahe Sun

Corresponding Author:
Jiahe Sun
Affiliation(s)

Tepper School of Business, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, U.S

Abstract

Within the behavioral finance framework, investor sentiment is a key driver that can cause asset prices to deviate from fundamentals. In the digital era, multi-source social media platforms such as Reddit and X (Twitter) generate massive, real-time streams of investor opinion data, which create opportunities for high-frequency and fine-grained sentiment measurement while simultaneously challenging the accuracy and adaptability of traditional text analysis methods. This study aims to develop an advanced sentiment analysis framework that integrates multi-source data fusion with optimization of pre-trained language models. To address the challenges posed by complex financial text semantics and the scarcity of labeled data, we move beyond conventional lexicon-based and classical machine learning approaches by adopting and refining pre-trained transformers such as BERT and RoBERTa. Domain-specific fine-tuning of these models markedly improves classification accuracy and robustness. Furthermore, to overcome the limitations of single data sources, we systematically fuse heterogeneous textual data from specialty investment forums (for example WallStreetBets) and broader social platforms (for example X), and construct high-frequency multidimensional indicators including bullish sentiment, attention intensity, and opinion disagreement, thereby exposing sentiment heterogeneity that arises from differences in platform user composition and network structure. An event-driven empirical analysis reveals that bullish sentiment generally exerts a positive effect on individual stock returns, with the effect becoming particularly pronounced within certain event windows; conversely, investor attention is found to be significantly negatively correlated with returns, offering new empirical support for limited attention theories. Notably, the influence of opinion disagreement on market outcomes is not uniform but exhibits clear platform heterogeneity, with stronger negative effects in communities where opinion convergence is greater. All principal findings are supported by thorough heterogeneity checks and robustness tests to ensure result reliability. Methodologically, this work proposes a new paradigm of “pre-trained model optimization + multi-source data fusion” for sentiment analysis; theoretically, it deepens understanding of how sentiment transmission differs across digital social ecosystems; practically, it supplies quantitative strategies, risk management, and regulatory technology with crucial tools and insights.

Keywords

investor sentiment; pre-trained models; multi-source data fusion; text analysis; behavioral finance

Cite This Paper

Jiahe Sun. Research on Sentiment Analysis Based on Multi-source Data Fusion and Pre-trained Model Optimization in Quantitative Finance. Socio-Economic Statistics Research (2025), Vol. 6, Issue 2: 89-98. https://doi.org/10.38007/SESR.2025.060209.

References

[1] Wang W, Su C, Duxbury D. The conditional impact of investor sentiment in global stock markets: A two-channel examination[J]. Journal of Banking & Finance, 2022, 138: 106458.

[2] Wang, C. (2025). Exploration of Optimization Paths Based on Data Modeling in Financial Investment Decision-Making. European Journal of Business, Economics & Management, 1(3), 17-23.

[3] Almansour B Y, Elkrghli S, Almansour A Y. Behavioral finance factors and investment decisions: A mediating role of risk perception[J]. Cogent Economics & Finance, 2023, 11(2): 2239032.

[4] Zhou D W, Sun H L, Ning J, et al. Continual learning with pre-trained models: A survey[J]. arXiv preprint arXiv:2401.16386, 2024.

[5] Yang H, Zhao Y, Wu Y, et al. Large language models meet text-centric multimodal sentiment analysis: A survey[J]. arXiv preprint arXiv:2406.08068, 2024.

[6] Li M, Wang F, Jia X, et al. Multi-source data fusion for economic data analysis[J]. Neural Computing and Applications, 2021, 33(10): 4729-4739.

[7] Liu, Y. (2025). The Importance of Cross-Departmental Collaboration Driven by Technology in the Compliance of Financial Institutions. Economics and Management Innovation, 2(5), 15-21.

[8] Zhang, Xuanrui. “Automobile Finance Credit Fraud Risk Early Warning System based on Louvain Algorithm and XGBoost Model.” In 2025 3rd International Conference on Data Science and Information System (ICDSIS), pp. 1-7. IEEE, 2025.

[9] Jing X. Real-Time Risk Assessment and Market Response Mechanism Driven by Financial Technology[J]. Economics and Management Innovation, 2025, 2(3): 14-20.

[10] Zhou Y. Research on the Innovative Application of Fintech and AI in Energy Investment[J]. European Journal of Business, Economics & Management, 2025, 1(2): 76-82.