Welcome to Scholar Publishing Group

International Journal of Big Data Intelligent Technology, 2025, 6(1); doi: 10.38007/IJBDIT.2025.060111.

Efficient Multimodal Visual Segmentation Model Based on Phased Fusion of Differential Modalities

Author(s)

Bukun Ren

Corresponding Author:
Bukun Ren
Affiliation(s)

College of Engineering, University of California Berkeley, Berkeley, CA 94720, California, United States

Abstract

This article focuses on the application of an efficient multimodal visual segmentation model based on phased fusion of differential modalities in image harmonization tasks. In response to the problem of the failure of the harmonization model due to the lack of a predetermined foreground mask in practical application scenarios, this paper innovatively proposes a multimodal image harmonization task, which replaces the foreground mask by introducing a referential description for the foreground. This article constructs a new multimodal harmonization dataset ReiHarmony4 based on the traditional image harmonization dataset iHarmony4. For this task, this article proposes a segmentation harmonization pipeline model and two different end-to-end methods, including a combination of CLIP based referential image segmentation model and Harmony Transformer, and DiffHarmony based on stable diffusion model. The experimental results show that these models can effectively complete the task of multimodal image harmonization. This article also designs and implements a multimodal image harmonization system, which achieves the function of image harmonization based on text. Although some achievements have been made, there are still some issues that need further exploration, such as improving segmentation performance, solving the problem of resolution degradation, and expanding the dataset.

Keywords

Multimodal image harmonization, phased fusion of differential modalities, referential description, ReiHarmony4 dataset, efficient visual segmentation model

Cite This Paper

Bukun Ren. Efficient Multimodal Visual Segmentation Model Based on Phased Fusion of Differential Modalities. International Journal of Big Data Intelligent Technology (2025), Vol. 6, Issue 1: 109-117. https://doi.org/10.38007/IJBDIT.2025.060111.

References

[1] Xu, Yue. "Research on Maiustream Web Database Development Technclogy." Journal of Computer Science and Artificial Intelligence 2.2 (2025): 29-32.

[2] Zhu, Zhongqi. "Strategies for Improving Vector Database Performance through Algorithm Optimization." Scientific Journal of Technology 7.2 (2025): 138-144.

[3] Wang, Buqin. "Strategies and Practices for Load Test Optimization in Distributed Systems." Scientific Journal of Technology 7.2 (2025): 132-137.

[4] Zhang, Jingtian. "Research on Worker Allocation Optimization Based on Real-Time Data in Cloud Computing." Frontiers in Science and Engineering 5.2 (2025): 119-125.

[5] Hao, Linfeng. "Application of Machine Learning Algorithms in Improving the Performance of Autonomous Vehicles." Scientific Journal of Technology 7.2 (2025): 118-124.

[6] Gu, Yiting. "Practical Approaches to Develo**High-performance Web Applications Based on React." Frontiers in Science and Engineering 5.2 (2025): 99-105.

[7] Guo X. Research on systemic financial risk early warning based on integrated classification algorithm[C]//2024 IEEE 2nd International Conference on Electrical, Automation and Computer Engineering (ICEACE). IEEE, 2024: 1586-1591.

[8] Chen, H., Yang, Y., & Shao, C. (2021). Multi-task learning for data-efficient spatiotemporal modeling of tool surface progression in ultrasonic metal welding. Journal of Manufacturing Systems, 58, 306-315.

[9]  Shanshan Feng, Ke Ma, Gongpin Cheng, Risk Evolution along the Oil and Gas Industry Chain: Insights from Text Mining Analysis, Finance Research Letters, 2025, 106813, ISSN 1544-6123

[10] Tan, Weiyan, Shujia Wu, and Ke Ma. "Freight Volume Prediction for Logistics Sorting Centers Using an Integrated GCN-BiLSTM-Transformer Model." Advances in Computer and Engineering Technology Research 1.4 (2024): 320-324 

[11] Fan, Sunjia, et al. "Defense methods against multi-language and multi-intent LLM attacks." International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024). Vol. 13403. SPIE, 2024.  

[12] Xiang, Y., Li, J., & Ma, K. (2024, October). Stock Price Prediction with Bert-BiLSTM Fusion Model in Bimodal Mode. In Proceeding of the 2024 5th International Conference on Computer Science and Management Technology (pp. 1219-1223).

[13] Pan, Yu. "Research on the Evolutionary Path of Resource Management and Capability Building for Platform Enterprises." International Journal of Finance and Investment 2.1 (2025): 78-81.

[14] Liu, Boyang. "Study on the Frequency of Computer Language Use Based on Big Data Analysis." Academic Journal of Computing & Information Science 7.10 (2024)

[15] Zhang, Yiru. "Design and Implementation of a Computer Network Log Analysis System Based on Big Data Analytics." Advances in Computer, Signals and Systems,(2024) 8(6),40-46.

[16] Liu, Yu. "Build an Audit Framework for Data Privacy Protection in Cloud Environment." Procedia Computer Science 247 (2024): 166-175.

[17] Liu, Boyang. "Design and Application of Experimental Data Management System Integrating Remote Monitoring and Historical Data Analysis." Journal of Electronics and Information Science 9.3 (2024): 160-167.

[18] Xu, Yue. "Research on Graph Network Social Recommendation Algorithm Based on AGRU-GNN." 2024 IEEE 4th International Conference on Data Science and Computer Application (ICDSCA). IEEE, 2024.

[19] Cui, Naizhong. "Optimization Strategies for Traffic Signal and Identification Design." Frontiers in Science and Engineering 5.2 (2025): 92-98.

[20] Ding, Maomao. "Design Innovation and User Satisfaction Improvement of AI Video Creation Tools." Scientific Journal of Technology 7.2 (2025): 112-117.

[21] Chen, Junyu. "Research on Intelligent Data Mining Technology Based on Geographic Information System." Journal of Computer Science and Artificial Intelligence 2.2 (2025): 12-16.

[22] Xu Y. Research on UAV Navigation System Based on Behavioral Programming[C]//2024 IEEE 7th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE). IEEE, 2024: 419-425.

[23] Xu, Y. (2024). Research on Social Network Secunity Issues and Countermeasures Based on Big Data. International Journal of Computer Science and Information Technology.

[24] Li, X.(2025)“Research on the application of GPS, total station and CAD Technology in architectural Grid.” Computer Life (2024),12(3),36-39.

[25] Zhang, Jinshuo "Research on Real Time Condition Monitoring and Fault Warning System for Construction Machinery under Multi Source Heterogeneous Data Fusion." Journal of Engineering Mechanics and Machinery (2024), 9(2): 139-144

[26] Wang, Yuxin "Research on Intelligent Macro Image Recognition Algorithm of Oil Pipe Failure Based on Deep Learning." Journal of Image Processing Theory and Applications (2025), 8(1): 1-7

[27] Ma Z. Innovative Application of Reinforcement Learning in User Growth and Behavior Prediction[J]. European Journal of AI, Computing & Informatics, 2025, 1(1): 18-24.

[28] Wang Y. Design and Implementation of a General Data Collection System Architecture Based on Relational Database Technology[C]//The International Conference on Cyber Security Intelligence and Analytics. Cham: Springer Nature Switzerland, 2024: 561-572.

[29] Zhang J. Research on Dynamic Stability Identification and Early Warning System for Engineering Vehicles Integrating Machine Learning and Data Driven Technology[J]. Academic Journal of Computing & Information Science, 2025, 8(2): 51-55.

[30] Liu B. Innovative Applications and Performance Optimization Strategies of Python Interpreter in Web Development[J].Journal of Network Computing and Applications (2025) ,10(1),1-7

[31] Liu Z. Research on the Application of Signal Integration Model in Real-Time Response to Social Events[J]. Journal of Computer, Signal, and System Research, 2025, 2(2): 102-106.

[32] Chen, Anyi. "Application of Quantum Computing Technology in the Optimization of Search Sorting for Fashion E-commerce." Journal of Computer Science and Artificial Intelligence 2.2 (2025): 8-11.

[33] Shi, Chongwei. "Research on Gene Identification Algorithms Based on Signal Processing Techniques." 2024 6th International Conference on Artificial Intelligence and Computer Applications (ICAICA). IEEE, 2024.

[34] Wang, Yuxin. "Application and Practice of Sensor Network Based on Deep Learning in Condition Monitoring of Underground Oil Production Equipment." International Journal of Frontiers in Engineering Technology 6.6 (2024).

[35] Zhang J. Research on Fault Prediction and Health Management System of Railway Tunnel Drilling and Blasting Construction Machinery Based on Machine Learning[J]. International Journal of New Developments in Engineering and Society, 2024, 8(5).

[36] Chen, H., Zuo, J., Zhu, Y., Kabir, M. R., & Han, A. (2024). Generalizable Deep Learning for Pulse-echo Speed of Sound Imaging via Time-shift Maps. In 2024 IEEE Ultrasonics, Ferroelectrics, and Frequency Control Joint Symposium (UFFC-JS) (pp. 1-4). IEEE.

[37] Yang, Jinzhu "Integrated Application of LLM Model and Knowledge Graph in Medical Text Mining and Knowledge Extraction."Social Medicine and Health Management (2024), 5(2): 56-62