Journal of Machine and Computing


Expert Crawler: Amalgamation of Deep Learning Models for Multilingual Multiclass Classification of Product Reviews



Journal of Machine and Computing

Received On : 24 October 2024

Revised On : 12 December 2024

Accepted On : 20 January 2025

Published On : 05 April 2025

Volume 05, Issue 02

Pages : 730-742


Abstract


With the proliferation of social platforms for online shopping, accurately predict- ing item categories from multilingual reviews has become crucial for informed decision-making. This paper addresses the significant challenge of categoriz- ing reviews across diverse languages by enhancing Transformer models for multilingual review classification, addressing key challenges such as efficiency, scalability, and interpretability. To improve model efficiency, we integrate sparse attention mechanisms using mBert, XLM-RoBERTa, and model distillation via DistilBERT, thus balancing performance with reduced computational cost. For data augmentation, we employ back-translation to enrich the training data, thereby enhancing model robustness and generalization across diverse languages. Additionally, to enhance model interpretability, we employ Local Interpretable Model-Agnostic Explanations to provide clear and actionable insights regarding model predictions. The proposed methods are applied to multilingual reviews sourced from products listed on Amazon covering the Spanish, English, German, Hindi, Chinese, Japanese, and French languages. The model achieves a classifica- tion accuracy of 88% across 32 product categories, demonstrating its effectiveness in solving the multilingual multiclass categorization problem in the retail sector. This work illustrates the potential of combining advanced natural language pro- cessing techniques with innovative approaches to improve the efficiency, accuracy, and interpretability of classification models, thereby facilitating better decision- making in online shopping platforms. With continued research, these models will offer increasingly robust solutions for processing and understanding multilingual data.


Keywords


Expert Crawler, Machine Learning XLM-RoBERTa, LIME, Natural Language Processing, Optimizers.


  1. M. Artetxe and H. Schwenk, “Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond,” Transactions of the Association for Computational Linguistics, vol. 7, pp. 597–610, Nov. 2019, doi: 10.1162/tacl_a_00288.
  2. S. H. Asefa and Y. Assabie, “Transformer-Based Amharic-to-English Machine Translation With Character Embedding and Combined Regularization Techniques,” IEEE Access, vol. 13, pp. 1090–1105, 2025, doi: 10.1109/access.2024.3521985.
  3. A. Babhulgaonkar and S. Sonavane, “Language Identification for Multilingual Machine Translation,” 2020 International Conference on Communication and Signal Processing (ICCSP), pp. 401–405, Jul. 2020, doi: 10.1109/iccsp48568.2020.9182184.
  4. A. Basile and C. Rubagotti, “CrotoneMilano for AMI at Evalita2018. A performant, cross-lingual misogyny detection system.,” EVALITA Evaluation of NLP and Speech Tools for Italian, pp. 206–210, 2018, doi: 10.4000/books.aaccademia.4734.
  5. A. Vijeevaraj Ann Sinthusha, E. Y. A. Charles, and R. Weerasinghe, “Machine Reading Comprehension for the Tamil Language With Translated SQuAD,” IEEE Access, vol. 13, pp. 13312–13328, 2025, doi: 10.1109/access.2025.3530949.
  6. X. Chen, Y. Sun, B. Athiwaratkun, C. Cardie, and K. Weinberger, “Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification,” Transactions of the Association for Computational Linguistics, vol. 6, pp. 557–570, Dec. 2018, doi: 10.1162/tacl_a_00039.
  7. S. K. W. Chu, R. Xie, and Y. Wang, “Cross-Language Fake News Detection,” Data and Information Management, vol. 5, no. 1, pp. 100–109, Jan. 2021, doi: 10.2478/dim-2020-0025.
  8. Conneau, K. Khandelwal, et al., “Unsupervised cross-lingual representation learning at scale,” arXiv preprint arXiv:1911.02116, 2019. doi: 10.48550/ARXIV.1911.02116.
  9. A. Conneau et al., “XNLI: Evaluating Cross-lingual Sentence Representations,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, doi: 10.18653/v1/d18-1269.
  10. A. De, D. Bandyopadhyay, B. Gain, and A. Ekbal, “A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 1, pp. 1–20, Nov. 2021, doi: 10.1145/3472619.
  11. N. De Cao et al., “Multilingual Autoregressive Entity Linking,” Transactions of the Association for Computational Linguistics, vol. 10, pp. 274–290, 2022, doi: 10.1162/tacl_a_00460.
  12. V. Dogra et al., “A Complete Process of Text Classification System Using State-of-the-Art NLP Models,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–26, Jun. 2022, doi: 10.1155/2022/1883698.
  13. J. M. Eisenschlos, et al., “MultiFiT: Efficient multi-lingual language model fine-tuning,” arXiv preprint arXiv:1909.04761, 2019. doi: 10.48550/ARXIV.1909.04761.
  14. H. Fei and P. Li, “Cross-Lingual Unsupervised Sentiment Classification with Multi-View Transfer Learning,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, doi: 10.18653/v1/2020.acl-main.510.
  15. N. Goyal, J. Du, M. Ott, G. Anantharaman, and A. Conneau, “Larger-Scale Transformers for Multilingual Masked Language Modeling,” Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), 2021, doi: 10.18653/v1/2021.repl4nlp-1.4.
  16. S. Aggarwal, S. Kumar, and R. Mamidi, “Efficient Multilingual Text Classification for Indian Languages,” Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, pp. 19–25, 2021, doi: 10.26615/978-954-452-072-4_003.
  17. K. Karthikeyan, et al., “Cross-lingual ability of multilingual BERT: An empirical study,” arXiv preprint arXiv:1912.07840, 2019. doi: 10.48550/ARXIV.1912.07840.
  18. P. Keung, et al., “The multilingual Amazon reviews corpus,” arXiv preprint arXiv:2010.02573, 2020. doi: 10.48550/ARXIV.2010.02573.
  19. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. doi: 10.48550/ARXIV.1412.6980.
  20. Kumar, “Multilingual natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 1, 2025. doi: 10.1109/TNNLS.2025.10830644.
  21. Z. Li et al., “Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages,” Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2290–2301, 2020, doi: 10.18653/v1/2020.emnlp-main.179.
  22. G. Manias, A. Mavrogiorgou, A. Kiourtis, C. Symvoulidis, and D. Kyriazis, “Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data,” Neural Computing and Applications, vol. 35, no. 29, pp. 21415–21431, May 2023, doi: 10.1007/s00521-023-08629-3.
  23. M. E. Mswahili and Y. S. Jeong, “Tokenizers for African languages,” IEEE Access, vol. 1, 2024. doi: 10.1109/ACCESS.2024.10815724.
  24. V. Sanh, et al., “DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019. doi: 10.48550/ARXIV.1910.01108.
  25. Vaswani, et al., “Attention is all you need,” arXiv preprint arXiv:1706.03762, Aug. 2023.
  26. S. Yu, J. Su, and D. Luo, “Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge,” IEEE Access, vol. 7, pp. 176600–176612, 2019, doi: 10.1109/access.2019.2953990.
  27. W. Zhu, et al., “Multilingual machine translation with large language models: Empirical results and analysis,” arXiv preprint arXiv:2304.04675, 2023. doi: 10.48550/ARXIV.2304.04675.

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Priyanka Sharma Ganesh Gopal Devarajan and Manash Sarkar; Methodology: Priyanka Sharma and Ganesh Gopal Devarajan; Software: Priyanka Sharma; Data Curation: Ganesh Gopal Devarajan and Manash Sarkar; Writing- Original Draft Preparation: Priyanka Sharma, Ganesh Gopal Devarajan and Manash Sarkar; Visualization: Priyanka Sharma; Investigation: Priyanka Sharma, Ganesh Gopal Devarajan and Manash Sarkar; Supervision: Priyanka Sharma and Ganesh Gopal Devarajan; Validation: Ganesh Gopal Devarajan and Manash Sarkar; Writing- Reviewing and Editing: Priyanka Sharma, Ganesh Gopal Devarajan and Manash Sarkar; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


We would like to thank Reviewers for taking the time and effort necessary to review the manuscript. We sincerely appreciate all valuable comments and suggestions, which helped us to improve the quality of the manuscript.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Priyanka Sharma, Ganesh Gopal Devarajan and Manash Sarkar, “Expert Crawler: Amalgamation of Deep Learning Models for Multilingual Multiclass Classification of Product Reviews”, Journal of Machine and Computing, pp. 730-742, April 2025, doi: 10.53759/7669/jmc202505058.


Copyright


© 2025 Priyanka Sharma, Ganesh Gopal Devarajan and Manash Sarkar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.