Journal of Machine and Computing


Impact of Effective Word Vectors on Deep Learning Based Subjective Classification of Online Reviews



Journal of Machine and Computing

Received On : 10 November 2023

Revised On : 27 March 2024

Accepted On : 18 June 2024

Published On : 05 July 2024

Volume 04, Issue 03

Pages : 736-747


Abstract


Sentiment Analysis tasks are made considerably simpler by extracting subjective statements from online reviews, thereby reducing the overhead of the classifiers. The review dataset encompasses both subjective and objective sentences, where subjective writing expresses the author's opinions, and objective text presents factual information. Assessing the subjectivity of review statements involves categorizing them as objective or subjective. The effectiveness of word vectors plays a crucial role in this process, as they capture the semantics and contextual cues of a subjective language. This study investigates the significance of employing sophisticated word vector representations to enhance the detection of subjective reviews. Several methodologies for generating word vectors have been investigated, encompassing both conventional approaches, such as Word2Vec and Global Vectors for word representation, and recent innovations, such as like Bidirectional Encoder Representations from Transformers (BERT), ALBERT, and Embeddings from Language Models. These neural word embeddings were applied using Keras and Scikit-Learn. The analysis focuses on Cornell subjectivity review data within the restaurant domain, and metrics evaluating performance, such as accuracy, F1-score, recall, and precision, are assessed on a dataset containing subjective reviews. A wide range of conventional vector models and deep learning-based word embeddings are utilized for subjective review classification, frequently in combination with deep learning architectures like Long Short-Term Memory (LSTM). Notably, pre-trained BERT-base word embeddings exhibited exceptional accuracy of 96.4%, surpassing the performance of all other models considered in this study. It has been observed that BERT-base is expensive because of its larger structure.


Keywords


Natural Language Processing, Subjective Classification, Word Embeddings, Text Representations, Bidirectional Encoder Representations from Transformers.


  1. M. Arslan and C. Cruz, “Leveraging NLP approaches to define and implement text relevance hierarchy framework for business news classification,” Procedia Computer Science, vol. 225, pp. 317–326, 2023, doi: 10.1016/j.procs.2023.10.016.
  2. D. Jannach, “Evaluating conversational recommender systems,” Artificial Intelligence Review, vol. 56, no. 3, pp. 2365–2400, Jul. 2022, doi: 10.1007/s10462-022-10229-x.
  3. Cavnar, William B., and John M. Trenkle. "N-gram-based text categorization." Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. Vol. 161175. 1994.
  4. Sarkar, Atanu, Anil Bikash Chowdhury, and Mauparna Nandan. "Classification of Online Fake News Using N-Gram Approach and Machine Learning Techniques." Doctoral Symposium on Human Centered Computing. Singapore: Springer Nature Singapore, 2023.
  5. Das, Mamata, and P. J. A. Alphonse. "A comparative study on tf-idf feature weighting method and its analysis using unstructured dataset." arXiv preprint arXiv:2308.04037 (2023).
  6. T. Hasan and A. Matin, “Extract Sentiment from Customer Reviews: A Better Approach of TF-IDF and BOW-Based Text Classification Using N-Gram Technique,” Proceedings of International Joint Conference on Advances in Computational Intelligence, pp. 231–244, 2021, doi: 10.1007/978-981-16-0586-4_19.
  7. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems 26 (2013).
  8. Dharma, Eddy Muntina, et al. "The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (cnn) text classification." J Theor Appl Inf Technol 100.2 (2022): 31.
  9. W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 5, no. 2, p. 85, Feb. 2020, doi: 10.26555/jiteki.v5i2.15021.
  10. Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
  11. Sanh, Victor, et al. "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv:1910.01108 (2019).
  12. Yang, Zhilin, et al. "Xlnet: Generalized autoregressive pretraining for language understanding." Advances in neural information processing systems 32 (2019).
  13. Wang, Hanqi, Xiaoli Hu, and Huibing Zhang. "Sentiment analysis of commodity reviews based on ALBERT-LSTM." Journal of Physics: Conference Series. Vol. 1651. No. 1. IOP Publishing, 2020.
  14. Xie, Shuyi, et al. "PALI at SemEval-2021 task 2: fine-tune XLM-RoBERTa for word in context disambiguation." arXiv preprint arXiv:2104.10375 (2021).
  15. M. P. Geetha and D. Karthika Renuka, “Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model,” International Journal of Intelligent Networks, vol. 2, pp. 64–69, 2021, doi: 10.1016/j.ijin.2021.06.005.
  16. Xu, Hu, et al. "BERT post-training for review reading comprehension and aspect-based sentiment analysis." arXiv preprint arXiv:1904.02232 (2019).
  17. Cornell Subjectivity Dataset: “Movie Review Data”. https://www.cs.cornell.edu/people/pabo/movie-review-data/
  18. W. A. Qader, M. M. Ameen, and B. I. Ahmed, “An Overview of Bag of Words;Importance, Implementation, Applications, and Challenges,” 2019 International Engineering Conference (IEC), Jun. 2019, doi: 10.1109/iec47844.2019.8950616.
  19. K. Ethayarajh, “How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings,” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, doi: 10.18653/v1/d19-1006.
  20. M. Grohe, “word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data,” Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Jun. 2020, doi: 10.1145/3375395.3387641.
  21. J. Pennington, R. Socher, and C. Manning, “Glove: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, doi: 10.3115/v1/d14-1162.
  22. A. van Loon and J. Freese, “Word Embeddings Reveal How Fundamental Sentiments Structure Natural Language,” American Behavioral Scientist, vol. 67, no. 2, pp. 175–200, Feb. 2022, doi: 10.1177/00027642211066046.
  23. Y. Liu, Z. Yin, C. Ni, C. Yan, Z. Wan, and B. Malin, “Examining Rural and Urban Sentiment Difference in COVID-19–Related Topics on Twitter: Word Embedding–Based Retrospective Study,” Journal of Medical Internet Research, vol. 25, p. e42985, Feb. 2023, doi: 10.2196/42985.
  24. R. Patil, S. Boit, V. Gudivada, and J. Nandigam, “A Survey of Text Representation and Embedding Techniques in NLP,” IEEE Access, vol. 11, pp. 36120–36146, 2023, doi: 10.1109/access.2023.3266377.
  25. G. S, D. T, and A. Haldorai, “A Supervised Machine Learning Model for Tool Condition Monitoring in Smart Manufacturing,” Defence Science Journal, vol. 72, no. 5, pp. 712–720, Nov. 2022, doi: 10.14429/dsj.72.17533.
  26. J. Mutinda, W. Mwangi, and G. Okeyo, “Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding (LeBERT) Model with Convolutional Neural Network,” Applied Sciences, vol. 13, no. 3, p. 1445, Jan. 2023, doi: 10.3390/app13031445.
  27. M. Qorich and R. El Ouazzani, “Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks,” The Journal of Supercomputing, vol. 79, no. 10, pp. 11029–11054, Feb. 2023, doi: 10.1007/s11227-023-05094-6.
  28. A. Areshey and H. Mathkour, “Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model,” Sensors, vol. 23, no. 11, p. 5232, May 2023, doi: 10.3390/s23115232.
  29. Maas, Andrew, et al. "Learning word vectors for sentiment analysis." Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. 2011.
  30. M. Giatsoglou, M. G. Vozalis, K. Diamantaras, A. Vakali, G. Sarigiannidis, and K. Ch. Chatzisavvas, “Sentiment analysis leveraging emotions and word embeddings,” Expert Systems with Applications, vol. 69, pp. 214–224, Mar. 2017, doi: 10.1016/j.eswa.2016.10.043.
  31. Garrido-Merchan, Eduardo C., Roberto Gozalo-Brizuela, and Santiago Gonzalez-Carvajal. "Comparing BERT against traditional machine learning models in text classification." Journal of Computational and Cognitive Engineering 2.4 (2023): 352-356.
  32. M. García, S. Maldonado, and C. Vairetti, “Efficient n-gram construction for text categorization using feature selection techniques,” Intelligent Data Analysis, vol. 25, no. 3, pp. 509–525, Apr. 2021, doi: 10.3233/ida-205154.
  33. A. Mallik and S. Kumar, “Word2Vec and LSTM based deep learning technique for context-free fake news detection,” Multimedia Tools and Applications, vol. 83, no. 1, pp. 919–940, May 2023, doi: 10.1007/s11042-023-15364-3.
  34. G. Nasreen, M. Murad Khan, M. Younus, B. Zafar, and M. Kashif Hanif, “Email spam detection by deep learning models using novel feature selection technique and BERT,” Egyptian Informatics Journal, vol. 26, p. 100473, Jun. 2024, doi: 10.1016/j.eij.2024.100473.
  35. Diaz Tiyasya Putra and Erwin Budi Setiawan, “Sentiment Analysis on Social Media with Glove Using Combination CNN and RoBERTa,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 3, pp. 457–563, Jun. 2023, doi: 10.29207/resti.v7i3.4892.
  36. P. Rakshit and A. Sarkar, “A supervised deep learning-based sentiment analysis by the implementation of Word2Vec and GloVe Embedding techniques,” Multimedia Tools and Applications, Apr. 2024, doi: 10.1007/s11042-024-19045-7.
  37. Y. Wu, Z. Jin, C. Shi, P. Liang, and T. Zhan, “Research on the application of deep learning-based BERT model in sentiment analysis,” Applied and Computational Engineering, vol. 71, no. 1, pp. 14–20, May 2024, doi: 10.54254/2755-2721/71/2024ma.
  38. A. Sharma and D. B. Jayagopi, “Modeling essay grading with pre-trained BERT features,” Applied Intelligence, vol. 54, no. 6, pp. 4979–4993, Mar. 2024, doi: 10.1007/s10489-024-05410-4.
  39. M. M. Danyal, S. S. Khan, M. Khan, S. Ullah, F. Mehmood, and I. Ali, “Proposing sentiment analysis model based on BERT and XLNet for movie reviews,” Multimedia Tools and Applications, Jan. 2024, doi: 10.1007/s11042-024-18156-5.
  40. S. Kumar, U. Gupta, A. K. Singh, and A. K. Singh, “Artificial Intelligence,” Journal of Computers, Mechanical and Management, vol. 2, no. 3, pp. 31–42, Aug. 2023, doi: 10.57159/gadl.jcmm.2.3.23064.
  41. N. Ranjan, “Enhancing Voting Security and Efficiency,” Journal of Computers, Mechanical and Management, vol. 2, no. 3, pp. 9–15, Aug. 2023, doi: 10.57159/gadl.jcmm.2.3.23065.
  42. S. B. Kulkarni and S. Kulkarni, “Study of the Value of π Probability Sampling by Testing Hypothesis and Experimentally,” Journal of Computers, Mechanical and Management, vol. 3, no. 1, pp. 22–29, Feb. 2024, doi: 10.57159/gadl.jcmm.3.1.240101.
  43. N. Kumar, U. Dugal, and A. Singh, “Optimizing Task Scheduling in Cloud Computing Environments using Hybrid Swarm Optimization,” Journal of Computers, Mechanical and Management, vol. 2, no. 5, pp. 08–13, Oct. 2023, doi: 10.57159/gadl.jcmm.2.5.23076.

Acknowledgements


The author(s) received no financial support for the research, authorship, and/or publication of this article.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Priya Kamath B, Geetha M, Dinesh Acharya U, Ritika Nandi and Siddhaling Urolagin, “Impact of Effective Word Vectors on Deep Learning Based Subjective Classification of Online Reviews”, Journal of Machine and Computing, pp. 736-747, July 2024. doi: 10.53759/7669/jmc202404069.


Copyright


© 2024 Priya Kamath B, Geetha M, Dinesh Acharya U, Ritika Nandi and Siddhaling Urolagin. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.