Welcome to Scholar Publishing Group

Zoology and Animal Physiology, 2021, 2(2); doi: 10.38007/ZAP.2021.020201.

English Text Similarity Detection Algorithm Based on Animal Algorithm

Author(s)

Shihui Xiang

Corresponding Author:
Shihui Xiang
Affiliation(s)

Panyapiwat Institute of Management, Nonthaburi, Thailand

Abstract

The text similarity detection algorithm has a wide range of applications in the processing of massive natural language text information. Unlike simple and complete repetitive search, the complexity of natural language has caused great difficulties in the calculation of text semantic similarity. The Simhash algorithm does not involve the semantic information of the text, and cannot support the semantic problems of natural language processing such as synonyms and polysemes. Therefore, using the “dimensionality reduction” advantage of animal algorithms in English text processing and the efficiency of the retrieval process, aiming at its inability to recognize semantically similar text content, this paper studies the English text similarity detection algorithm of animal algorithms. Aiming at the shortcomings of simhash in the semantic similarity of text, this paper proposes a semantic code design based on the synonym word forest and context through the study of existing synonym expansion schemes. Based on the comprehensive improvement scheme, a semantic fingerprint generation algorithm incorporating synonym information is proposed, which solves the problem of similar texts that cannot identify replacement synonyms. The experiments in this paper show that after testing the sample data of the algorithm in this paper, under the condition of using k = 3 parameter determination, the accuracy and recall of correct identification are over 77%. In contrast, the two indexes of the traditional simhash algorithm and the word frequency statistical algorithm are only about 70%. This proves that the improved algorithm proposed in this paper has achieved relatively good results for identifying multiple similar modification situations, especially the problem of synonym replacement.

Keywords

Text Similarity, Similarity Detection, Animal Algorithm, Semantic Unit Division, Part-Of-Speech Space Definition

Cite This Paper

Shihui Xiang. English Text Similarity Detection Algorithm Based on Animal Algorithm. Zoology and Animal Physiology (2021), Vol. 2, Issue 2: 1-12. https://doi.org/10.38007/ZAP.2021.020201.

References

[1] Sara Muslih Mishal , Murtadha M. (2022). Hamad, Text Classification Using Convolutional Neural Networks, Fusion: Practice and Applications, 7(1), pp. 53- 65 https://doi.org/10.54216/FPA.070105

[2] Cao, J., van Veen, E. M., Peek, N., Renehan, A. G., & Ananiadou, S. (2021). EPICURE: Ensemble Pretrained Models for Extracting Cancer Mutations from Literature. In 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS) pp. 461-467. https://doi.org/10.1109/CBMS52027.2021.00054

[3] M. I. Schlesinger, E. V. Vodolazskiy, & V. M. Yakovenko.(2016). “Fréchet Similarity of Closed Polygonal Curves”, International Journal of Computational Geometry & Applications, 26(01), pp.53-66. https://doi.org/10.1142/S0218195916500035

[4] Daniel Lamprecht, Kristina Lerman, Denis Helic, & Markus Strohmaier.(2017). “How the Structure of Wikipedia Articles Influences User Navigation”, New Review of Hypermedia & Multimedia, 23(1), pp.29-50. https://doi.org/10.1080/13614568.2016.1179798

[5] Henning Pohl, Christian Domin, & Michael Rohs.(2017). “Beyond just Text: Semantic Emoji Similarity Modeling to Support Expressive Communication”, Acm Transactions on Computer Human Interaction, 24(1), pp.1-42. https://doi.org/10.1145/3039685

[6] Ward Peeters, John Linnegar, Marilize Pretorius, & Marina Vulovic. (2017). “Review of Weideman, Albert (2017) Responsible Design in Applied Linguistics: Theory and Practice”, English Text Construction, 10(1), pp.179-185. https://doi.org/10.1075/etc.10.1.10pee

[7] Ahmed A. Elngar , Mohamed Arafa , Amar Fathy , Basma Moustafa , Omar Mahmoud , Mohamed Shaban , (2021). Nehal Fawzy, Image Classification Based On CNN: A Survey, Journal of Cybersecurity and Information Management, 6(1), pp. 18-50 https://doi.org/10.54216/JCIM.060102

[8] Qi Zhang, Yang Wang, Jin Qian, & Xuanjing Huang.(2016). “A Mixed Generative-discriminative Based Hashing Method”, IEEE Transactions on Knowledge & Data Engineering, 28(4), pp.845-857. https://doi.org/10.1109/TKDE.2015.2507127

[9] Li, X.(2016). “Method for Semantic Similarity Detection in English Based on Ontology”, Journal of Computational & Theoretical Nanoscience, 13(12), pp.9464-9468. https://doi.org/10.1166/jctn.2016.5866

[10] C. Wang, Y. Fan, & B. Li.(2017). “Saliency Detection Based on Robust Foreground Selection”, Journal of Electronics & Information Technology, 39(11), pp.2644-2651.

[11] Abhijit Saha , Arnab Paul, (2019). Generalized Weighted Exponential Similarity Measures of Single Valued Neutrosophic Sets, International Journal of Neutrosophic Science, 2019(II), pp. 57-66. https://doi.org/10.54216/IJNS.000201

[12] Alexandros Belesiotis, Dimitrios Skoutas, Christodoulos Efstathiades, Vassilis Kaffes, & Dieter Pfoser.(2018). “Spatio-textual User Matching and Clustering Based on Set Similarity Joins”, Vldb Journal, 27(10), pp.1-24. https://doi.org/10.1007/s00778-018-0498-5

[13] Lu-Fang Lin.(2016). “The Impact of Video-based Materials on Chinese-speaking Learners’ English Text Comprehension”, English Language Teaching, 9(10), pp.1. https://doi.org/10.5539/elt.v9n10p1

[14] Lee, Y. J. (2017). “First Steps Toward Critical Literacy: Interactions with an English Narrative Text Among Three English as a Foreign Language Readers in South Korea”, Journal of Early Childhood Literacy, 17(2), pp.45-55. https://doi.org/10.1177/1468798415599048

[15] O’Grady, Gerard.(2016). “Given/new: What do the Terms Refer to?: a First (small) Step”, English Text Construction, 9(1), pp.9-32. https://doi.org/10.1075/etc.9.1.02ogr

[16] Margaret Berry.(2016). “Dynamism in Exchange Structure”, English Text Construction, 9(1), pp.33-55. https://doi.org/10.1075/etc.9.1.03ber

[17] Isa Abdullahi Baba, & Evren Hincal.(2017). “Global Stability Analysis of Two-strain Epidemic Model with Bilinear and Non-monotone Incidence Rates”, European Physical Journal Plus, 132(5), pp.208. https://doi.org/10.1140/epjp/i2017-11476-x

[18] Xiunan Wang, & Xiao-Qiang Zhao.(2017). “A Climate-based Malaria Model with the Use of Bed Nets”, Journal of Mathematical Biology, 77(30), pp.1-25. https://doi.org/10.1007/s00285-017-1183-9

[19] Bin-Guo Wang, Wan-Tong Li, & Zhi-Cheng Wang.(2016). “A Reaction–diffusion Sis Epidemic Model in an Almost Periodic Environment”, Zeitschrift Für Angewandte Mathematik Und Physik Zamp, 66(6), pp.3085-3108. https://doi.org/10.1007/s00033-015-0585-z