Welcome to Scholar Publishing Group

International Journal of Art Innovation and Development, 2023, 4(1); doi: 10.38007/IJAID.2023.040110.

Multi-voice Music Generation System Based on Recurrent Neural Network


Fei Qiao

Corresponding Author:
Fei Qiao

Shanxi Technology and Business College, Shanxi, China

Philippine Christian University, Manila, Philippine


Music is closely related to human life, and it is an important way for people to express their feelings and sing about life. With the rapid progress of artificial intelligence in recent years and its application in various fields, it has also brought great development to computer music, among which algorithmic composition is an important research branch of computer music. This paper aims to study the design of multi-voice music generation system based on recurrent neural network. This paper will take music audio as the research object, and propose a new algorithm for automatically synthesizing music based on recurrent neural network. The framework of automatic music synthesis with audio as the research object mainly includes the analysis of audio files, the audio features of music and the model applied to automatic composition. In the audio file analysis part, the structure of the audio file and the important parameters related to this experiment are introduced in detail, which lays the foundation for the successful experimental operation. In the part of music audio features, it introduces features including mel-frequency cepstral coefficients, linear predictive coding, zero-crossing rate, short-time energy value, etc. Among the models used in automatic composition, the circulating neural network, which is the most active artificial neural network in the field of automatic composition algorithm, and two variants of long-term and short-term memory model and gated circulating unit model are emphatically introduced, which are also the basic models studied in this paper. Secondly, the algorithm of automatic music and audio synthesis based on neural network is described in detail. Firstly, the problem of automatic music and audio synthesis is formally described, and the concepts of unit music, unit music vector, AI-generated music, etc. are put forward, which represents music creation as a processable problem. Then, the process of extracting the audio features of unit music is described in detail. After that, the prediction and synthesis process of music audio are described in detail, and the algorithm description is given. Finally, the audio mosaic synthesis part which directly affects the audience's intuitive auditory experience is introduced, and the method of weakening and enhancing first is put forward for superposition mosaic, so as to achieve smooth mosaic effect. Finally, a series of experiments are carried out on the algorithm model, including the music and audio automatic synthesis experiment based on LSTM model, the human-computer interaction experiment and the music and audio automatic synthesis experiment based on GRU.


Recurrent Neural Network, Multi-Voice Music Generation System, Artificial Intelligence, Automatic Composition

Cite This Paper

Fei Qiao. Multi-voice Music Generation System Based on Recurrent Neural Network. International Journal of Art Innovation and Development (2023), Vol. 4, Issue 1: 119-127. https://doi.org/10.38007/IJAID.2023.040110.


[1]Huo J, Sun W, Dai H. Research on Machine Vision Effect Based on Graph Neural Network Decision. Journal of Physics: Conference Series. (2021) 1952(2): 022-050 (7pp).

[2]Zhengwen Li, Wenju Du, Nini Rao. Research on Classification Method Based on Inaccurate Image Dataset Cleaning. Journal of Signal Processing. (2022) 38(7):1547-1554.

[3]Tobing P L, Wu Y C, Hayashi T, et al. Voice Conversion with CycleRNN-based Spectral Mapping and Finely Tuned WaveNet Vocoder. IEEE Access. (2019)  PP F(99):1-1.

[4]Lu Y, Yang Y, Wang L, et al. Application of Deep Learning in the Prediction of Benign and Malignant Thyroid Nodules on Ultrasound Images. IEEE Access. (2020) PP (99):1-1.

[5]Al-Rubaye W, Al-Araji A S, Dhahad H A. An Adaptive Digital Neural Network-Like-PID Control Law Design for Fuel Cell System based on FPGA Technique. University of Baghdad Engineering Journal. (2020) 26(9):24-44.

[6]Ota S, Taki S, Jindai M, et al. Nodding detection system based on head motion and voice rhythm. Journal of Advanced Mechanical Design Systems and Manufacturing. (2021) 15(1):JAMDSM0005-JAMDSM0005.

[7]Luo M, Ke Q, Li J. Research on Automatic Braking and Traction Control of High-speed Train Based on Neural Network. Journal of Physics: Conference Series. (2021) 1952(3):032048-.

[8]Dhinavahi A. Speech Recognition-based Billing System: A multi-model design and implementation. International Journal of Advanced Trends in Computer Science and Engineering. (2020) 9(2):1568-1573.

[9]Gao F, Meng C. Design of marine environmental noise signal generation system based on MATLAB, LabVIEW and FPGA. Journal of Physics: Conference Series. (2019) 1303(1):012068 (7pp).

[10]Lu C, Zhang Y, Zheng Y, et al. Precipitable water vapor fusion of MODIS and ERA5 based on convolutional neural network. GPS Solutions. (2023) 27(1):1-13.

[11]Borzov S M, Karpov A V, Potaturkin O I, et al. Application of Neural Networks for Differential Diagnosis of Pulmonary Pathologies Based on X-Ray Images. Optoelectronics, Instrumentation and Data Processing. (2022) 58(3):257-265.

[12]Al_Araji A, Al-Zangana S J. Design of New Hybrid Neural Controller for Nonlinear CSTR System based on Identification. University of Baghdad Engineering Journal. (2019) 25(4):70-89.

[13]Ahmad, Riaz, Ahmed, et al. Urdu Nasta'liq text recognition system based on multi-dimensional recurrent neural network and statistical features. Neural computing & applications, 2017, 28(2):219-231.

[14] Tobing P L ,  Wu Y C ,  Hayashi T , et al. Voice Conversion with CycleRNN-based Spectral Mapping and Finely Tuned WaveNet Vocoder. IEEE Access, 2019, PP(99):1-1.

[15] Seujski M ,  Suzic S ,  Pekar D , et al. Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding. Journal of Universal Computer Science, 2020, 26(4):434-453.