Welcome to Scholar Publishing Group

International Journal of Neural Network, 2020, 1(4); doi: 10.38007/NN.2020.010404.

Intelligent Speech Recognition and Sentiment Analysis Considering LSTM Neural Network

Author(s)

Deyan Long and Kun Zhan

Corresponding Author:
Kun Zhan
Affiliation(s)

Anhui University, Hefei, China

Abstract

As one of the research hotspots in artificial intelligence application field, speech emotion recognition(ER) is of great value in human-computer interaction system. This paper mainly studies intelligent speech recognition and sentiment analysis considering LSTM neural network. This paper first analyzes the basic flow of speech ER research, analysis of emotion theory. The attention mechanism is used to improve the long term memory network, and the performance of long term memory network for speech ER is optimized. The simulation results show that the optimized model significantly improves the accuracy of recognition. 

Keywords

LSTM Neural Network, Intelligent Speech Recognition, Speech Emotion, Attention Mechanism

Cite This Paper

Deyan Long and Kun Zhan. Intelligent Speech Recognition and Sentiment Analysis Considering LSTM Neural Network. International Journal of Neural Network (2020), Vol. 1, Issue 4: 26-33. https://doi.org/10.38007/NN.2020.010404.

References

[1] Dar M N,  Akram M U,  Khawaja S G, et al. CNN and LSTM-Based Emotion Charting Using Physiological Signals. Sensors. (2020) 20(16):4551. https://doi.org/ 10.3390/s20164551

[2] Shaqra F A,  Duwairi R,  Al-Ayyoub M. Recognizing Emotion from Speech Based on Age and Gender Using Hierarchical Models-ScienceDirect. Procedia Computer Science. (2019)  151(C):37-44. https://doi.org/10.1016/j.procs.2019.04.009

[3] Henkel A P, Bromuri S, Iren D, et al. Half Human, Half Machine – Augmenting Service Employees with AI for Interpersonal Emotion Regulation. Journal of Service Management. (2020) ahead-of-print (ahead-of-print). https://doi.org/ 0.1108/josm-05-2019-0160

[4] Shinnosuke, Ikeda. Investigation of the Role of Emotion Words on ER from Facial Expression, Affective Voice, and Affective Music by Using Semantic Satiation. Japanese Journal of Research on Emotions. (2018) 26(1):12-18. https://doi.org/10.4092/jsre.26.1_12

[5] Lee, Yoo. Recognition of Negative Emotion Using Long Short-Term Memory with Bio-Signal Feature Compression. Sensors. (2020) 20(2):573. https://doi.org/10.3390/s20020573

[6] Li T, Kuo P H, Tsai T N, et al. CNN and LSTM Based Facial Expression Analysis Model for a Humanoid Robot. IEEE Access. (2019) (99):1-1. https://doi.org/10.1109/ACCESS.2019.2928364

[7] Mersbergen M V, Lyons P, Riegler D. Vocal Responses in Heighted States of Arousal. Journal of Voice. (2017) 31(1):127.e13. https://doi.org/10.1016/j.jvoice.2015.12.011

[8] Green J J, Eigsti I M. Cell-phone vs microphone recordings: Judging emotion in the voice. Journal of the Acoustical Society of America. (2017) 142(3):1261. https://doi.org/10.1121/1.5000482

[9] Massaro A,  Savino N,  Galiano A M , et al. Voice analysis rehabilitation platform based on LSTM algorithm. International Journal of Telemedicine and Clinical Practices. (2020) 1(1):1. https://doi.org/10.1504/IJTMCP.2020.10034206

[10] Bromuri S, Henkel A P, Iren D, et al. Using AI to predict service agent stress from emotion patterns in service interactions. Journal of Service Management. (2020) ahead-of-print (ahead-of-print). https://doi.org/10.1108/JOSM-06-2019-0163

[11] Bromuri S, Henkel A P, Iren D, et al. Using AI to predict service agent stress from emotion patterns in service interactions. Journal of Service Management. (2020) ahead-of-print (ahead-of-print). https://doi.org/10.1108/JOSM-06-2019-0163

[12] Dimitrova-Grekow T, Klis A, Igras-Cybulska M. Speech ER Based on Voice Fundamental Frequency. Archives of acoustics. (2019) 44(2):277-286. https://doi.org/

[13] Schuller B W. Speech ER Two Decades in a Nutshell, Benchmarks, and Ongoing Trends. Communications of the ACM. (2018) 61(5):90-99. https://doi.org/10.1145/3129340

[14] Vryzas N, Kotsakis R, Liatsou A, et al. Speech ER for Performance Interaction. Journal of the Audio Engineering Society. (2018) 66(6):457-467. https://doi.org/10.17743/jaes.2018.0036

[15] Michalis, Papakostas, Evaggelos, et al. Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech ER. Computation. (2017) 5(4):26-26. https://doi.org/10.3390/computation5020026

[16] Alghifari M F, Gunawan T S, Kartiwi M. Speech ER using deep feedforward neural network. Indonesian Journal of Electrical Engineering and Computer Science. Farhoudi Z, Setayeshi S, Rabiee A. Using learning automata in brain emotional learning for speech ER. International Journal of Speech Technology, (2017) 20(3):1-10. https://doi.org/10.1007/s10772-017-9426-0

[17] Kacur J, Puterka B, Pavlovicova J, et al. On the Speech Properties and Feature Extraction Methods in Speech ER. Sensors. (2020) 21(5):1888. https://doi.org/10.3390/s21051888