Opinion mining is the approach of utilizing Natural Language Processing (NLP) concepts to extract the public opinions on specific topics and has gained increasing significance in major text mining applications. Many opinion mining methods have been developed that builds a model to collect and analyse the opinions on topics from the blogs, reviews, comments or tweets. Recently, the application of opinion mining on medical tweets has gained immense research interest due to the challenge of processing unique medical terms in tweets. In this paper, an opinion mining framework has been developed to provide automatic extraction of opinions from medical tweets using improved optimization algorithms. The input tweets undergo pre-processing, and features are extracted by POS tagging and n-grams. Then the feature subset candidates are selected using Penguin Search Optimization algorithm (PeSOA) and Improved PeSOA. In PeSOA, the solution search operation is random and does not utilize exploration concept effectively in order to maintain simplicity. The Improved PeSOA exploits this limitation and introduces a new solution search equation to compliment the traditional search process and an effective feature subset ranking concept. These concepts of Improved PeSOA increase the efficiency of selecting optimal feature subsets. Once the features are selected, the final classification is performed using k-Nearest Neighbor (k-NN), Naïve Bayes (NB) and Support Vector Machine (SVM) classifiers to obtain the opinions. Experiments are conducted on medical datasets containing Cancer and drug tweets. The results prove that the classification accuracy for opinion mining has been increased significantly by the use of Improved PeSOA than the traditional PeSOA.
B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis,” Foundations and Trends® in Information Retrieval, vol. 2, no. 1–2, pp. 1–135, 2008, doi: 10.1561/1500000011.
E. Cambria, B. Schuller, Y. Xia, and C. Havasi, “New Avenues in Opinion Mining and Sentiment Analysis,” IEEE Intelligent Systems, vol. 28, no. 2, pp. 15–21, Mar. 2013, doi: 10.1109/mis.2013.30.
B. Liu, “Sentiment Analysis,” Jun. 2015, doi: 10.1017/cbo9781139084789.
E. Cambria, “Affective Computing and Sentiment Analysis,” IEEE Intelligent Systems, vol. 31, no. 2, pp. 102–107, Mar. 2016, doi: 10.1109/mis.2016.31.
R. Feldman, “Techniques and applications for sentiment analysis,” Communications of the ACM, vol. 56, no. 4, pp. 82–89, Apr. 2013, doi: 10.1145/2436256.2436274.
W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093–1113, Dec. 2014, doi: 10.1016/j.asej.2014.04.011.
E. Boiy and M.-F. Moens, “A machine learning approach to sentiment analysis in multilingual Web texts,” Information Retrieval, vol. 12, no. 5, pp. 526–558, Sep. 2008, doi: 10.1007/s10791-008-9070-z.
K. Denecke and Y. Deng, “Sentiment analysis in medical settings: New opportunities and challenges,” Artificial Intelligence in Medicine, vol. 64, no. 1, pp. 17–27, May 2015, doi: 10.1016/j.artmed.2015.03.006.
V. Carchiolo, A. Longheu, and M. Malgeri, “Using Twitter Data and Sentiment Analysis to Study Diseases Dynamics,” Information Technology in Bio- and Medical Informatics, pp. 16–24, 2015, doi: 10.1007/978-3-319-22741-2_2.
C. Zucco, H. Liang, G. D. Fatta, and M. Cannataro, “Explainable Sentiment Analysis with Applications in Medicine,” 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1740–1747, Dec. 2018, doi: 10.1109/bibm.2018.8621359.
M. Ghiassi and S. Lee, “A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach,” Expert Systems with Applications, vol. 106, pp. 197–216, Sep. 2018, doi: 10.1016/j.eswa.2018.04.006.
Ankit and N. Saleena, “An Ensemble Classification System for Twitter Sentiment Analysis,” Procedia Computer Science, vol. 132, pp. 937–946, 2018, doi: 10.1016/j.procs.2018.05.109.
J.-C. Na, W. Y. M. Kyaing, C. S. G. Khoo, S. Foo, Y.-K. Chang, and Y.-L. Theng, “Sentiment Classification of Drug Reviews Using a Rule-Based Linguistic Approach,” The Outreach of Digital Libraries: A Globalized Resource Network, pp. 189–198, 2012, doi: 10.1007/978-3-642-34752-8_25.
I. Korkontzelos, A. Nikfarjam, M. Shardlow, A. Sarker, S. Ananiadou, and G. H. Gonzalez, “Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts,” Journal of Biomedical Informatics, vol. 62, pp. 148–158, Aug. 2016, doi: 10.1016/j.jbi.2016.06.007.
H. Luna-Aveiga et al., “Sentiment Polarity Detection in Social Networks: An Approach for Asthma Disease Management,” Advanced Computational Methods for Knowledge Engineering, pp. 141–152, Jun. 2017, doi: 10.1007/978-3-319-61911-8_13.
R. G. Rodrigues, R. M. das Dores, C. G. Camilo-Junior, and T. C. Rosa, “SentiHealth-Cancer: A sentiment analysis tool to help detecting mood of patients in online social networks,” International Journal of Medical Informatics, vol. 85, no. 1, pp. 80–95, Jan. 2016, doi: 10.1016/j.ijmedinf.2015.09.007.
W. C. Crannell, E. Clark, C. Jones, T. A. James, and J. Moore, “A pattern-matched Twitter analysis of US cancer-patient sentiments,” Journal of Surgical Research, vol. 206, no. 2, pp. 536–542, Dec. 2016, doi: 10.1016/j.jss.2016.06.050.
M. del P. Salas-Zárate, J. Medina-Moreira, K. Lagos-Ortiz, H. Luna-Aveiga, M. Á. Rodríguez-García, and R. Valencia-García, “Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach,” Computational and Mathematical Methods in Medicine, vol. 2017, pp. 1–9, 2017, doi: 10.1155/2017/5140631.
H. Keshavarz and M. S. Abadeh, “ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs,” Knowledge-Based Systems, vol. 122, pp. 1–16, Apr. 2017, doi: 10.1016/j.knosys.2017.01.028.
A. Onan and S. Korukoğlu, “A feature selection model based on genetic rank aggregation for text sentiment classification,” Journal of Information Science, vol. 43, no. 1, pp. 25–38, Jul. 2016, doi: 10.1177/0165551515613226.
F. Iqbal et al., “A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction,” IEEE Access, vol. 7, pp. 14637–14652, 2019, doi: 10.1109/access.2019.2892852.
Abd. S. H. Basari, B. Hussin, I. G. P. Ananta, and J. Zeniarja, “Opinion Mining of Movie Review Using Hybrid Method of Support Vector Machine and Particle Swarm Optimization,” Procedia Engineering, vol. 53, pp. 453–462, 2013, doi: 10.1016/j.proeng.2013.02.059.
M. S. Akhtar, S. Kohail, A. Kumar, A. Ekbal, and C. Biemann, “Feature Selection Using Multi-Objective Optimization for Aspect Based Sentiment Analysis,” Natural Language Processing and Information Systems, pp. 15–27, 2017, doi: 10.1007/978-3-319-59569-6_2.
A. Chandra Pandey, D. Singh Rajpoot, and M. Saraswat, “Twitter sentiment analysis using hybrid cuckoo search method,” Information Processing & Management, vol. 53, no. 4, pp. 764–779, Jul. 2017, doi: 10.1016/j.ipm.2017.02.004.
A. Alarifi, A. Tolba, Z. Al-Makhadmeh, and W. Said, “A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks,” The Journal of Supercomputing, vol. 76, no. 6, pp. 4414–4429, May 2018, doi: 10.1007/s11227-018-2398-2.
M. Tubishat, M. A. M. Abushariah, N. Idris, and I. Aljarah, “Improved whale optimization algorithm for feature selection in Arabic sentiment analysis,” Applied Intelligence, vol. 49, no. 5, pp. 1688–1707, Nov. 2018, doi: 10.1007/s10489-018-1334-8.
J. Du, J. Xu, H. Song, X. Liu, and C. Tao, “Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets,” Journal of Biomedical Semantics, vol. 8, no. 1, Mar. 2017, doi: 10.1186/s13326-017-0120-6.
K. Wegrzyn-Wolska, L. Bougueroua, and G. Dziczkowski, “Social media analysis for e-health and medical purposes,” 2011 International Conference on Computational Aspects of Social Networks (CASoN), pp. 278–283, Oct. 2011, doi: 10.1109/cason.2011.6085958.
A. Bell, J. M. Brenier, M. Gregory, C. Girand, and D. Jurafsky, “Predictability effects on durations of content and function words in conversational English,” Journal of Memory and Language, vol. 60, no. 1, pp. 92–111, Jan. 2009, doi: 10.1016/j.jml.2008.06.003.
Y. Gheraibia and A. Moussaoui, “Penguins Search Optimization Algorithm (PeSOA),” Recent Trends in Applied Artificial Intelligence, pp. 222–231, 2013, doi: 10.1007/978-3-642-38577-3_23.
X. Wu et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Dec. 2007, doi: 10.1007/s10115-007-0114-2.
CRediT Author Statement
The authors confirm contribution to the paper as follows:
Conceptualization: Anuprathibha T, Pravin Kumar M, Sakthi G and Rajkumar KK;
Methodology: Anuprathibha T and Pravin Kumar M;
Software: Sakthi G and Rajkumar KK;
Data Curation: Anuprathibha T and Pravin Kumar M;
Writing- Original Draft Preparation: Anuprathibha T, Pravin Kumar M, Sakthi G and Rajkumar KK;
Visualization: Anuprathibha T and Pravin Kumar M;
Investigation: Sakthi G and Rajkumar KK;
Supervision: Anuprathibha T and Pravin Kumar M;
Validation: Sakthi G and Rajkumar KK;
Writing- Reviewing and Editing: Anuprathibha T, Pravin Kumar M, Sakthi G and Rajkumar KK;
All authors reviewed the results and approved the final version of the manuscript.
Acknowledgements
Authors thank Reviewers for taking the time and effort necessary to review the manuscript.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Anuprathibha T
Department of Information Technology, V.S.B. Engineering College, Karur, Tamil Nadu, India.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Anuprathibha T, Pravin Kumar M, Sakthi G and Rajkumar KK, “Enhanced Opinion Mining from Medical Tweets Using an Optimized Penguin Search-Based Feature Selection Algorithm”, Journal of Machine and Computing, pp. 1174-1185, April 2025, doi: 10.53759/7669/jmc202505093.