Journal of Machine and Computing


Comparing Multilingual Emoji Enhanced Product Reviews: A Transformer Based Approach for Language Pair and Emotion Detection



Journal of Machine and Computing

Received On : 10 May 2024

Revised On : 23 December 2024

Accepted On : 14 February 2025

Published On : 05 April 2025

Volume 05, Issue 02

Pages : 804-813


Abstract


This paper presents a multilingual sentiment analysis pipeline leveraging two transformer-based architectures—XLM-RoBERTa (base) and BERT-based multilingual cased—to classify sentiment across four language pairs (English–Spanish, English–French, English–Hindi, and English–Italian). We fine-tune XLM-RoBERTa by unfreezing only its last three layers to adapt the model to domain-specific sentiment cues while preserving its robust cross-lingual representations. Training over ten epochs yields a best validation accuracy of 0.9579 and a test accuracy of 0.975, with an average F1-score around 0.92–0.97 across the four language pairs. The BERT-based multilingual cased model achieves a slightly higher test accuracy of 0.98, demonstrating comparable or improved performance in capturing sentiment nuances. These results confirm that selectively fine-tuning large-scale multilingual encoders is an effective strategy for cross-lingual sentiment classification, achieving high accuracy and strong generalization.


Keywords


Natural Language Processing, Language Pair Identification, Transformers, XLM-RoBERTa, mBERT, mT5, Emotion Detection.


  1. C. Wang and M. Banko, “Practical Transformer-based Multilingual Text Classification,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, pp. 121–129, 2021, doi: 10.18653/v1/2021.naacl-industry.16.
  2. A. Conneau et al., “Unsupervised Cross-lingual Representation Learning at Scale,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, doi: 10.18653/v1/2020.acl-main.747.
  3. B. S. Puttaswamy and N. Thillaiarasu, “Fine DenseNet based human personality recognition using english hand writing of non-native speakers,” Biomedical Signal Processing and Control, vol. 99, p. 106910, Jan. 2025, doi: 10.1016/j.bspc.2024.106910.
  4. S. Bhat, P. Y. Lakshmi, K. Bali, and M. Choudhury, “Code Mixing: A Challenge for Language Identification in the Indian Perspective,” in Proceedings of the 9th Workshop on Asian Language Resources (ALR9), 2014.
  5. G. Barbieri, F. Ronzano, M. Saggion, and H. Wanner, “What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis,” in LREC, 2016.
  6. A. Wijeratne, Q. Balasuriya, D. Y. Sheth, and M. D. Goodman, “EmojiNet: Building a Machine Readable Sense Inventory for Emoji,” in Proceedings of the 8th International Conference on Social Informatics, 2016.
  7. S. Liu, J. He, and B. Zhou, “Transformer-based joint deep learning for multilingual sentiment analysis,” IEEE Access, vol. 8, 2020, pp. 30139–30149.
  8. G. V. Singh, S. Ghosh, M. Firdaus, A. Ekbal, and P. Bhattacharyya, “Predicting multi-label emojis, emotions, and sentiments in code-mixed texts using an emojifying sentiments framework,” Scientific Reports, vol. 14, no. 1, May 2024, doi: 10.1038/s41598-024-58944-5.
  9. Iseal, Sheed, et al. "Cross-Lingual Sentiment Analysis of E-Commerce Product Review," (2024).
  10. A. Shahnaz Ipa et al., “BdSentiLLM: A Novel LLM Approach to Sentiment Analysis of Product Reviews,” IEEE Access, vol. 12, pp. 189330–189343, 2024, doi: 10.1109/access.2024.3516826.
  11. B. Khemani, S. Patil, K. Kotecha, and S. Tanwar, “A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions,” Journal of Big Data, vol. 11, no. 1, Jan. 2024, doi: 10.1186/s40537-023-00876-4.
  12. T. Tang, X. Tang, and T. Yuan, “Fine-Tuning BERT for Multi-Label Sentiment Analysis in Unbalanced Code-Switching Text,” IEEE Access, vol. 8, pp. 193248–193256, 2020, doi: 10.1109/access.2020.3030468.
  13. M. Ashwin Shenoy and N. Thillaiarasu, “Enhancing temple surveillance through human activity recognition: A novel dataset and YOLOv4-ConvLSTM approach,” Journal of Intelligent & Fuzzy Systems, vol. 45, no. 6, pp. 11217–11232, Dec. 2023, doi: 10.3233/jifs-233919.
  14. S. Khanuja, S. Dandapat, A. Srinivasan, S. Sitaram, and M. Choudhury, “GLUECoS: An Evaluation Benchmark for Code-Switched NLP,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3575–3585, 2020, doi: 10.18653/v1/2020.acl-main.329.
  15. S. Akhter, & S. Saha, “Sentiment Analysis on Code-Mixed Social Media Text in a Low-Resource Language,” IEEE Transactions on Affective Computing, 2021.
  16. A. Radford, J. W. Kim, C. A. Hallacy, Ramesh, G. Goh, S. Agarwal, & I. Sutskever, “Learning Transferable Visual Models from Natural Language Supervision (CLIP),” International Conference on Machine Learning (ICML), (2021).
  17. K. Schouten, & F. Frasincar, “Finding Emotion in Emojis: A Study of Emoji Use in Multilingual Sentiment Analysis,” Information Processing & Management, 58(4), (2021).
  18. F. Barbieri, J. Camacho-Collados, L. Espinosa Anke, and L. Neves, “TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification,” Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, doi: 10.18653/v1/2020.findings-emnlp.148.
  19. X. Zhou, J. Li, & Y. Liu, “Improving Low-Resource Question Answering with Cross-Lingual Data Augmentation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics” (Volume 1: Long Papers), 3245–3257, (2022).
  20. M.-J. Hwang, R. Yamamoto, E. Song, and J.-M. Kim, “TTS-by-TTS: TTS-Driven Data Augmentation for Fast and High-Quality Speech Synthesis,” ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6598–6602, Jun. 2021, doi: 10.1109/icassp39728.2021.9414408.
  21. J. Chen, D. Tam, C. Raffel, M. Bansal, and D. Yang, “An Empirical Survey of Data Augmentation for Limited Data Learning in NLP,” Transactions of the Association for Computational Linguistics, vol. 11, pp. 191–211, 2023, doi: 10.1162/tacl_a_00542.
  22. S. Bhat, K. M. Chandu, S. Java, & A. W. Black, “Evaluating Neural Approaches for Text Normalization in Low-Resource Scenarios,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), (2021).
  23. S. Sitaram, M. Choudhury, & K. Bali, “A Survey of Code-Switched Speech and Language Processing. Frontiers in Artificial Intelligence,” 3, 17, (2020).
  24. B. Zhang, T. Nakatani D. V Hussey, S. Walter & L. Tan, “Don’t Just Translate, Summarize Too: Cross-lingual Product Title Generation in E-commerce,” In Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024, pages 58–64, Torino, Italia. ELRA and ICCL, (2024).

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Priyanka Sharma and Ganesh Gopal Devarajan; Methodology: Ganesh Gopal Devarajan; Software: Priyanka Sharma and Ganesh Gopal Devarajan; Data Curation: Priyanka Sharma; Writing- Original Draft Preparation: Priyanka Sharma and Ganesh Gopal Devarajan; Visualization: Ganesh Gopal Devarajan; Investigation: Priyanka Sharma and Ganesh Gopal Devarajan; Supervision: Priyanka Sharma; Validation: Ganesh Gopal Devarajan; Writing- Reviewing and Editing: Priyanka Sharma and Ganesh Gopal Devarajan; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


Author(s) thanks to Dr.Ganesh Gopal Devarajan for this research completion and support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Priyanka Sharma and Ganesh Gopal Devarajan, “Comparing Multilingual Emoji Enhanced Product Reviews: A Transformer Based Approach for Language Pair and Emotion Detection”, Journal of Machine and Computing, pp. 804-813, April 2025, doi: 10.53759/7669/jmc202505063.


Copyright


© 2025 Priyanka Sharma and Ganesh Gopal Devarajan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.