Journal of Machine and Computing


Influence of Pre-Processing Strategies on Sentiment Analysis Performance: Leveraging Bert, TF-IDF and Glove Features



Journal of Machine and Computing

Received On : 18 September 2024

Revised On : 30 October 2024

Accepted On : 18 November 2024

Published On : 05 January 2025

Volume 05, Issue 01

Pages : 464-473


Abstract


The analysis of user-generated content, such as product reviews on platforms like Amazon, is critical for understanding consumer sentiment. However, the unstructured nature of these reviews poses challenges for accurate sentiment analysis (SA). This study examines the influence of different preprocessing techniques on the effectiveness of sentiment analysis utilizing three feature extraction methods: BERT, TF-IDF, and GloVe. We evaluated the effectiveness of these techniques with machine learning classifiers such as: Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB), and Extreme Gradient Boosting (XGBoost). Our findings indicate that preprocessing significantly enhances classification accuracy, particularly for models using TF-IDF and GloVe features, while BERT-based models showed robust performance even with minimal preprocessing. By combining BERT with preprocessing techniques, we attained an exceptional accuracy rate of 98.3% in sentiment analysis. This underscores the significance of meticulous data pretreatment in this field. These insights enhance the creation of more efficient sentiment classification algorithms, providing reliable information from Amazon product reviews.


Keywords


Preprocessing Techniques, Text Embedding Techniques, BERT, TFIDF, GLOVE, Sentiment Analysis, Machine Learning Classifiers, Amazon Reviews.


  1. L. Xiaoyan, R. C. Raga, and S. Xuemei, “GloVe-CNN-BiLSTM Model for Sentiment Analysis on Text Reviews,” Journal of Sensors, vol. 2022, pp. 1–12, Oct. 2022, doi: 10.1155/2022/7212366.
  2. N. Sultan, “Sentiment Analysis of Amazon Product Reviews using Supervised Machine Learning Techniques,” Knowledge Engineering and Data Science, vol. 5, no. 1, p. 101, Jun. 2022, doi: 10.17977/um018v5i12022p101-108.
  3. S. N. Ahmad and M. Laroche, “Analyzing electronic word of mouth: A social commerce construct,” International Journal of Information Management, vol. 37, no. 3, pp. 202–213, Jun. 2017, doi: 10.1016/j.ijinfomgt.2016.08.004.
  4. Z. Xiang, Q. Du, Y. Ma, and W. Fan, “A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism,” Tourism Management, vol. 58, pp. 51–65, Feb. 2017, doi: 10.1016/j.tourman.2016.10.001.
  5. J. Wang, M. D. Molina, and S. S. Sundar, “When expert recommendation contradicts peer opinion: Relative social influence of valence, group identity and artificial intelligence,” Computers in Human Behavior, vol. 107, p. 106278, Jun. 2020, doi: 10.1016/j.chb.2020.106278.
  6. R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Computer Science, vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.
  7. M M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artificial Intelligence Review, vol. 55, no. 7, pp. 5731–5780, Feb. 2022, doi: 10.1007/s10462-022-10144-1.
  8. A. Mukherjee, V. Venkataraman, B. Liu, and N. Glance, “What Yelp Fake Review Filter Might Be Doing?,” Proceedings of the International AAAI Conference on Web and Social Media, vol. 7, no. 1, pp. 409–418, Aug. 2021, doi: 10.1609/icwsm.v7i1.14389.
  9. S. Fouzia Sayeedunnissa, A. R. Hussain, and M. A. Hameed, “Supervised Opinion Mining of Social Network Data Using a Bag-of-Words Approach on the Cloud,” Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012), pp. 299–309, Dec. 2012, doi: 10.1007/978-81-322-1041-2_26.
  10. T. Singh and M. Kumari, “Role of Text Pre-processing in Twitter Sentiment Analysis,” Procedia Computer Science, vol. 89, pp. 549–554, 2016, doi: 10.1016/j.procs.2016.06.095.
  11. Z. Jianqiang and G. Xiaolin, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017, doi: 10.1109/access.2017.2672677.
  12. Y. Bao, C. Quan, L. Wang, and F. Ren, “The Role of Pre-processing in Twitter Sentiment Analysis,” Intelligent Computing Methodologies, pp. 615–624, 2014, doi: 10.1007/978-3-319-09339-0_62.
  13. M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Applied Sciences, vol. 12, no. 17, p. 8765, Aug. 2022, doi: 10.3390/app12178765.
  14. D. Effrosynidis, S. Symeonidis, and A. Arampatzis, “A Comparison of Pre-processing Techniques for Twitter Sentiment Analysis,” Research and Advanced Technology for Digital Libraries, pp. 394–406, 2017, doi: 10.1007/978-3-319-67008-9_31.
  15. R. Krishnan and S. Durairaj, “Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications,” Computing, vol. 106, no. 12, pp. 3837–3878, Jun. 2024, doi: 10.1007/s00607-024-01301-1.
  16. S. Sagnika, B. S. P. Mishra, and S. K. Meher, “Improved method of word embedding for efficient analysis of human sentiments,” Multimedia Tools and Applications, vol. 79, no. 43–44, pp. 32389–32413, Aug. 2020, doi: 10.1007/s11042-020-09632-9.
  17. M. P. Sinka and D. Corne, “Evolving better stoplists for document clustering and web intelligence,” in Design and Application of Hybrid Intelligent Systems, IOS Press, pp. 1015–1023, 2003.
  18. R. Lourdusamy and S. Abraham, “A Survey on Text Pre-processing Techniques and Tools,” International Journal of Computer Sciences and Engineering, vol. 06, no. 03, pp. 148–157, Apr. 2018, doi: 10.26438/ijcse/v6si3.148157.
  19. I. Kadhim, "An Evaluation of Preprocessing Techniques for Text Classification," International Journal of Computer Science and Information Security (IJCSIS), vol. 16, no. 6, pp. 22-32, June 2018.
  20. A. K. Uysal and S. Gunal, “The impact of preprocessing on text classification,” Information Processing & Management, vol. 50, no. 1, pp. 104–112, Jan. 2014, doi: 10.1016/j.ipm.2013.08.006.
  21. M. Avinash and E. Sivasankar, “A Study of Feature Extraction Techniques for Sentiment Analysis,” Emerging Technologies in Data Mining and Information Security, pp. 475–486, Sep. 2018, doi: 10.1007/978-981-13-1501-5_41.
  22. Devlin J, Chang M-W, Lee K, Toutanova K, “Bert: pre-training of deep bidirectional transformers for language understanding,” 2018, arXiv preprint arXiv:1810.04805.
  23. C. Sun, L. Huang, & X. Qiu, “Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence, (2019), arXiv preprint arXiv:1903.09588.

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Kosala N and Nirmalrani V; Methodology: Kosala N and Nirmalrani V; Writing- Original Draft Preparation: Kosala N; Investigation: Nirmalrani V; Supervision: Nirmalrani V; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


Author(s) thanks to Dr. Nirmalrani V for this research completion and support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Kosala N and Nirmalrani V, “Influence of Pre-Processing Strategies on Sentiment Analysis Performance: Leveraging Bert, TF-IDF and Glove Features”, Journal of Machine and Computing, vol.5, no.1, pp. 464-473, January 2025, doi: 10.53759/7669/jmc202505036.


Copyright


© 2025 Kosala N and Nirmalrani V. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.