Filtering Spam Messages Based on Improved Naive Bayesian Algorithm

Machine Learning Theory and Practice, 2020, 1(2); doi: 10.38007/ML.2020.010201.

Filtering Spam Messages Based on Improved Naive Bayesian Algorithm

Author(s)

Xiaolei Zhang

Corresponding Author:

Xiaolei Zhang

Affiliation(s)

Personnel Department, Liaoning Police College, Dalian 116036, Liaoning, China

Download PDF
|
Download: 15
|
View: 238

Abstract

The relative lack of system and supervision has caused many negative impacts on the "black industry" around wireless communication, such as the spam messages of mobile phones, which have always troubled people's lives. This paper focuses on the research of spam short message filtering based on the improved naive Bayes algorithm. This paper introduces the background of spam short message, and summarizes the status quo of spam short message filtering technology and filtering system construction at home and abroad. The intelligent filtering technology is optimized, and a new algorithm combining Bayesian network classification algorithm with artificial intelligence algorithm is used to filter junk short messages. The final experimental results show that the improved naive Bayesian short message filtering method proposed in this paper can improve the accuracy, recall and overall efficiency of short message filtering, while the filtering efficiency also increases with the expansion of the cluster size.

Keywords

Machine Learning, Naive Bayes, SMS Filtering, Text Classification

Cite This Paper

Xiaolei Zhang. Filtering Spam Messages Based on Improved Naive Bayesian Algorithm. Machine Learning Theory and Practice (2020), Vol. 1, Issue 2: 1-9. https://doi.org/10.38007/ML.2020.010201.

References

[1] Mussa D J, Jameel N G M. Relevant SMS spam feature selection using wrapper approach and XGBoost algorithm. Kurdistan Journal of Applied Research, 2019, 4(2): 110-120. https://doi.org/10.24017/science.2019.2.11

[2] Adel H, Bayati M A. Building bi-lingual anti-spam SMS filter. International Journal of New Technology and Research, 2018, 4(1): 263147.

[3] Novo-Lourés M, Ruano-Ordás D, Pavón R, et al. Enhancing representation in the context of multiple-channel spam filtering. Information Processing & Management, 2020, 59(2): 102812.

[4] Vidhya K. A Machine Learning Approach to Prevent Malicious Calls over Telephony Networks. Turkish Journal of Computer and Mathematics Education (Turcomat), 2020, 12(9): 1767-1771.

[5] Barushka A, Hajek P. Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Applied Intelligence, 2018, 48(10): 3538-3556. https://doi.org/10.1007/s10489-018-1161-y

[6] Anitha P U, Rao C V G, Babu D S. Email Spam Filtering Using Machine Learning Based Xgboost Classifier Method. Turkish Journal of Computer and Mathematics Education, 2020, 12(11): 2182-2190.

[7] Irawan D, Perkasa E B, Yurindra Y, et al. Perbandingan Klassifikasi SMS Berbasis Support Vector Machine, Naive Bayes Classifier, Random Forest dan Bagging Classifier. Jurnal Sisfokom (Sistem Informasi dan Komputer), 2020, 10(3): 432-437. https://doi.org/10.32736/sisfokom.v10i3.1302

[8] Ouni S, Fkih F, Omri M N. BERT-and CNN-based Tobeat approach for unwelcome tweets detection. Social Network Analysis and Mining, 2020, 12(1): 1-19. https://doi.org/10.1007/s13278-022-00970-0

[9] Mohammed M A, Ibrahim D A, Salman A O. Adaptive intelligent learning approach based on visual anti-spam email model for multi-natural language. Journal of Intelligent Systems, 2020, 30(1): 774-792.

[10] Okunade O A. Improved Electronic Mail Classification Using Hybridized Root Word Extractions. Fudma Journal of Sciences-ISSN: 2616-1370, 2019, 3(1): 56-71.

[11] Chirra V R R, Maddiboyina H D, Dasari Y, et al. Performance Evaluation of Email Spam Text Classification Using Deep Neural Networks. Journal homepage: http://iieta. org/journals/rces, 2020, 7(4): 91-95. https://doi.org/10.18280/rces.070403

[12] Rajendran P, Tamilarasi A, Mynavathi R. A Collaborative Abstraction Based Email Spam Filtering with Fingerprints. Wireless Personal Communications, 2020, 123(2): 1913-1923. https://doi.org/10.1007/s11277-021-09221-5

[13] Othman N F, Din W. Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 2019, 14(3): 1508-1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517

[14] Kumar P S, Gowri D. Spam E-Mail Detection with Proabablistic Data Structure Using Java. IJRAR-International Journal of Research and Analytical Reviews (IJRAR), 2020, 8(3): 906-911-906-911.

[15] De Mendizabal I V, Basto-Fernandes V, Ezpeleta E, et al. SDRS: A new lossless dimensionality reduction for text corpora. Information Processing & Management, 2020, 57(4): 102249. https://doi.org/10.1016/j.ipm.2020.102249

[16] Mahabub A, Mahmud M I, Hossain M F. A robust system for message filtering using an ensemble machine learning supervised approach. ICIC Express Letters, Part B: Applications, 2019, 10(9): 805-812.

[17] Tuncer I, Kara K C, Karakas A. Determining abbreviations in Kariyer. net domain. New Trends and Issues Proceedings on Advances in Pure and Applied Sciences, 2020 (12): 01-07. https://doi.org/10.18844/gjpaas.v0i12.4980

[18] Odukoya O H, Adedoyin O B, Akhigbe B I, et al. An architectural-based approach to detecting spim in electronic means of communication. Nigerian Journal of Technology, 2018, 37(3): 770-778. https://doi.org/10.4314/njt.v37i3.28