Research on the Construction of Low-Resource Parallel Corpus Based on Translation Plug-in Technology

International Journal of Educational Curriculum Management and Research, 2024, 5(1); doi: 10.38007/IJECMR.2024.050115.

Research on the Construction of Low-Resource Parallel Corpus Based on Translation Plug-in Technology

Author(s)

Zulkar Iskander, Azragul Yusup

Corresponding Author:

Azragul Yusup

Affiliation(s)

Xinjiang Normal University, Urumqi, Xinjiang, China

National Language Resouree Monitoring & Research Center of Minority Languages, Beijing, China

Download PDF
|
Download: 69
|
View: 1908

Abstract

Parallel corpora play a crucial role in the field of natural language processing, especially in tasks such as machine translation and cross-language information retrieval. However, with the increasing demand for more languages, the challenges are becoming increasingly apparent, especially for low-resource languages. The purpose of this paper is to summarize the current status of parallel corpus research, discuss the challenges faced by low-resource corpora, focus on plug-in-based translation techniques to automatically construct Chinese-Uyghur and Chinese-Kazakh parallel lexicons. And look forward to the future research direction.

Keywords

Parallel Corpus, Low-Resource, Translation Plug-in Technology

Cite This Paper

Zulkar Iskander, Azragul Yusup. Research on the Construction of Low-Resource Parallel Corpus Based on Translation Plug-in Technology. International Journal of Educational Curriculum Management and Research (2024), Vol. 5, Issue 1: 101-106. https://doi.org/10.38007/IJECMR.2024.050115.

References

[1] Mao H, Yusup A, Ge Y, et al. Named entity recognition in Chinese e-commerce domain based on multi-head attention[C]//2022 9th International Conference on Dependable Systems and Their Applications (DSA). IEEE, 2022: 576-580.

[2] Wang Q, Li X. Chinese News Title Classification Model Based on ERNIE-TextRCNN[C]//Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing. 2022: 147-151.

[3] Liu Z. Development of advertising art design based on information technology[C]//International Conference on Cognitive based Information Processing and Applications (CIPA 2021) Volume 2. Springer Singapore, 2022: 3-10.

[4] Wang C, Zeng Q, Huang J. Practical Research on the Construction of Chinese-Yi-English Trilingual Parallel Corpus[J]. Lecture Notes on Language and Literature, 2023, 6(13): 71-77.

[5] Jing M. The Construction of a Multilingual Parallel Corpus for Hnewo Teyy[J]. Lecture Notes on Language and Literature, 2023, 6(16): 1-8.

[6] Lan Caiyu. Design and construction of a Chinese-English bilingual parallel corpus for Chinese medicine. Asia-Pacific Traditional Medicine,2014,10(08):1-3.

[7] Fang Lu. Research on the construction and application of English-Chinese comparable corpus. Suzhou University,2011.

[8] Niu Yitong. Research on Word Alignment Method Based on Chinese-Vietnamese Bilingual Parallel Corpus, Master's thesis of Kunming University of Science and Technology, 2017.4, Supervisor: jianmei Guo.

[9] Xue Yan. Research on Chinese-Mongolian word alignment and related technologies, Master's thesis, Inner Mongolia University, 2009.6, Supervisor: Nashun Urietu.

[10] Yusuf. Aibaidulla et al, "Contextual correlation processing in a syntactic analyzer for Viennese-centered language-driven grammars", Computer Applications and Software , 1999/6.

[11] Yusuf Aibaidulla et al, Determination of a tag set for Viennese lexical annotation for information processing, Computer Applications 2009/7

[12] Dong Meiping, Research on Automatic Acquisition and Domain Adaptation of Weihan-Chinese Machine Translation Corpus, Master's Thesis, Tsinghua University, Supervisor: Liu Yang, 2015-06.

[13] Caijenga. Research on large-scale Chinese-Tibetan (Tibetan-Chinese) bilingual corpus construction technology for natural language processing. Journal of Chinese Information,2011,25(06):157-161.

[14] Rexidan Tayi, Türgen Ibrahim. A study on sentence alignment method based on dictionary translation in Chinese-Uyghur bilingual corpus. Journal of Xinjiang University(Natural Science Edition),2009,26(03):359-363.

[15] XU Xiongfei. Research on automatic extraction of word alignment in Greater China . Jiangxi Normal University,2016.

[16] Dong Meiping. Research on the construction method of translation corpus based on non-parallel data . Tsinghua University,2015.

[17] Nurguli Aizimu put. Research on Chinese-Kazakh bilingual parallel corpus alignment method . Northeast Normal University,2013.

[18] Asimu Tohti. Research on Uyghur-Uzbek machine translation . Xinjiang University,2017.