Welcome to Scholar Publishing Group

International Journal of Educational Curriculum Management and Research, 2024, 5(1); doi: 10.38007/IJECMR.2024.050121.

A Study on the Translation Quality of ChatGPT


Min Zhang

Corresponding Author:
Min Zhang

Graduate School of Translation and Interpretation, Tianjin Foreign Studies University, Tianjin, 300204, China


The rapid development of natural language processing technology has led to a surge in interest in the application of large language models in the field of translation. This study employs the large language model ChatGPT as a translation tool, utilizing the American short story "A Service of Love" as a translated text. The objective is to compare and evaluate the translation quality and performance of ChatGPT in the language combinations of English-to-Chinese and English-to-Japanese with that of Google Translate and DEEPL Translator. Additionally, the study aims to explore the accuracy, fluency, fidelity, and adaptability of ChatGPT in the literary translation, with the intention of providing reference and inspiration for the translation strategy of AI, particularly in the context of literary translation in the language pairs of English and Asian languages.


ChatGPT, Translation Quality Evaluation, Literary Translation, English-to- Chinese, English-to-Japanese

Cite This Paper

Min Zhang. A Study on the Translation Quality of ChatGPT . International Journal of Educational Curriculum Management and Research (2024), Vol. 5, Issue 1: 153-163. https://doi.org/10.38007/IJECMR.2024.050121.


[1] Naveed, H., Khan, K U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A Comprehensive Overview of Large Language Models. https://arxiv.org/abs/2307.06435

[2] Sanz-Valdivieso, L., & López-Arroyo, B. (2023). Google Translate vs. ChatGPT: Can non-language professionals trust them for specialized translation? Proceedings of the International Conference onHuman-Informed Translation and Interpreting Technology 2023. https://doi.org/10.26615/issn.2683-0078.2023_008

[3] Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita,H., Kim, Y.J., Afify, M., & Awadalla H.H.  (2023). How good are GPT models at machine translation? A comprehensive Evaluation.  https://doi.org/10.48550/arXiv.2302.09210

[4] Sahari, Y., Qasem, F., Asiri, E., Alasmri, I., Assiri A., & Mahdi, H. (2023). Evaluating the Translation of Figurative Language: A Comparative Study of ChatGPT and Human Translators. https://doi.org/10.21203/rs.3.rs-3921149/v1

[5] Khoshafah, F. (2023). ChatGPT for Arabic-English Translation: Evaluating the Accuracy (2023). https://doi.org/10.21203/rs.3.rs-2814154/v1

[6] Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., ..., Fiedel, N. (2023). Palm: Scaling language modeling with pathways.  https://doi.org/10.48550/arXiv.2204.02311

[7] Rivera-Trigueros, I. (2022). Machine translation systems and quality assessment: a systematic review. Language Resources & Evaluation, 56, 593–619. https://doi.org/10.1007/s10579-021-09537-5

[8] Banerjee, S., & Lavie, L. (2005). METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65–72). https://www.cs.cmu.edu/~alavie/METEOR/pdf/Banerjee-Lavie-2005-METEOR.pdf

[9] Segonne, V., & Mickus, T. (2023). " Definition Modeling: To model definitions. " Generating Definitions With Little to No Semantics.  https://doi.org/10.48550/arXiv.2306.08433

[10] Haque, S., Eberhart, Z., Bansal, A., & McMillan, C. (2022). Semantic similarity metrics for evaluating source code summarization. Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension.  https://doi.org/10.1145/3524610.3527909