A Cerebral vascular accident, commonly known as a stroke, is a pathological condition that impacts the brain due to the rupture of capillaries. It occurs when there is a disturbance in the typical blood circulation and essential physiological processes of the brain. Stroke prediction plays a crucial role in early diagnosis and intervention, potentially improving patient outcomes. This paper proposes a machine learning model that leverages polynomial feature transformation and linear regression modeling for stroke prediction. The model addresses the challenge of capturing non-linear relationships between features and the target variable while maintaining interpretability. The proposed approach involves preprocessing data by separating categorical and numerical features, applying one-hot encoding to categorical features, and generating polynomial features up to the second degree for numerical features. This tailored preprocessing is facilitated by a Column Transformer. For model development, a machine learning pipeline is constructed, splitting the data into training and testing sets. Despite utilizing polynomial features, linear regression is employed as the final model, allowing for the capture of both linear and non-linear relationships while maintaining interpretability. This work contributes to stroke prediction by offering a balanced approach that considers model complexity and interpretability, showcasing the potential of linear regression with polynomial features for accurate predictions and insights into feature-target relationships. The proposed model exhibited superior performance compared to other existing models, achieving a remarkable testing accuracy of 99.2%.
Keywords
Stroke Prediction, Machine Learning, Polynomial Features, Linear Regression, One-Hot Encoding.
R. Karthik, R. Menaka, A. Johnson, and S. Anand, “Neuroimaging and deep learning for brain stroke detection - A review of recent advancements and future prospects,” Computer Methods and Programs in Biomedicine, vol. 197, p. 105728, Dec. 2020, doi: 10.1016/j.cmpb.2020.105728.
D. Arora, R. Garg, F. Asif, R. Garg, and N. Singla, “Performance evaluation of machine learning classifiers for brain stroke prediction,” International Journal of Bioinformatics Research and Applications, vol. 20, no. 1, pp. 61–77, 2024, doi: 10.1504/ijbra.2024.137369.
J. Xiang, Y. Dong, and Y. Yang, “Multi-Frequency Electromagnetic Tomography for Acute Stroke Detection Using Frequency-Constrained Sparse Bayesian Learning,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 4102–4112, Dec. 2020, doi: 10.1109/tmi.2020.3013100.
C.-H. Lin et al., “Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry,” Computer Methods and Programs in Biomedicine, vol. 190, p. 105381, Jul. 2020, doi: 10.1016/j.cmpb.2020.105381.
Y.-A. Choi et al., “Machine-Learning-Based Elderly Stroke Monitoring System Using Electroencephalography Vital Signals,” Applied Sciences, vol. 11, no. 4, p. 1761, Feb. 2021, doi: 10.3390/app11041761.
R. Choubey and P. Gautam, “Supervised ensemble classifier algorithm for prediction of liver disease, lung cancer and brain stroke,” International journal of health sciences, pp. 9581–9592, Jul. 2022, doi: 10.53730/ijhs.v6ns4.11241.
K. Kanagalakshmi and E. Chandra, “Log-Gabor Orientation with Run-Length Code based Fingerprint Feature Extraction Approach,” Global Journal of Computer Science and Technology, Vol. 14, no. 4, Jan. 2014.
J. Heo, J. G. Yoon, H. Park, Y. D. Kim, H. S. Nam, and J. H. Heo, “Machine Learning–Based Model for Prediction of Outcomes in Acute Stroke,” Stroke, vol. 50, no. 5, pp. 1263–1265, May 2019, doi: 10.1161/strokeaha.118.024293.
V. Abedi et al., “Novel Screening Tool for Stroke Using Artificial Neural Network,” Stroke, vol. 48, no. 6, pp. 1678–1681, Jun. 2017, doi: 10.1161/strokeaha.117.017033.
A. Stanciu et al., “A predictive analytics model for differentiating between transient ischemic attacks (TIA) and its mimics,” BMC Medical Informatics and Decision Making, vol. 20, no. 1, Jun. 2020, doi: 10.1186/s12911-020-01154-6.
V. Abedi et al., “Prediction of Long-Term Stroke Recurrence Using Machine Learning Models,” Journal of Clinical Medicine, vol. 10, no. 6, p. 1286, Mar. 2021, doi: 10.3390/jcm10061286.
V. Shenigaram, M. Menta, D. Pathri and C. Swapna, “An Analysis Of Brain Stroke Prediction Using Machine Learning,” Res Militaris, Vol. 9, no. 1, pp. 148-54, Nov. 2019.
S. Mainali, M. E. Darsie, and K. S. Smetana, “Machine Learning in Action: Stroke Diagnosis and Outcome Prediction,” Frontiers in Neurology, vol. 12, Dec. 2021, doi: 10.3389/fneur.2021.734345.
Z. Ghaleb Al-Mekhlafi et al., “Deep Learning and Machine Learning for Early Detection of Stroke and Haemorrhage,” Computers, Materials Continua, vol. 72, no. 1, pp. 775–796, 2022, doi: 10.32604/cmc.2022.024492.
T. I. Shoily, T. Islam, S. Jannat, S. A. Tanna, T. M. Alif, and R. R. Ema, “Detection of Stroke Disease using Machine Learning Algorithms,” 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6, Jul. 2019, doi: 10.1109/icccnt45670.2019.8944689.
X. Li, D. Bian, J. Yu, M. Li, and D. Zhao, “Using machine learning models to improve stroke risk level classification methods of China national stroke screening,” BMC Medical Informatics and Decision Making, vol. 19, no. 1, Dec. 2019, doi: 10.1186/s12911-019-0998-2.
P. Govindarajan, R. K. Soundarapandian, A. H. Gandomi, R. Patan, P. Jayaraman, and R. Manikandan, “RETRACTED ARTICLE: Classification of stroke disease using machine learning algorithms,” Neural Computing and Applications, vol. 32, no. 3, pp. 817–828, Jan. 2019, doi: 10.1007/s00521-019-04041-y.
D. Vetrithangam, V. Senthilkumar, Neha, A. R. Kumar, P. N. Kumar and M. Sharma, “Coronary Artery Disease Prediction Based on Optimal Feature Selection Using Improved Artificial Neural Network With Meta-Heuristic Algorithm,” Journal of Theoretical and Applied Information Technology, Vol. 100, no. 24, Dec. 2022.
C. S. Nwosu, S. Dev, P. Bhardwaj, B. Veeravalli, and D. John, “Predicting Stroke from Electronic Health Records,” 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jul. 2019, doi: 10.1109/embc.2019.8857234.
V. Bandi, D. Bhattacharyya, and D. Midhunchakkravarthy, “Prediction of Brain Stroke Severity Using Machine Learning,” Revue d’Intelligence Artificielle, vol. 34, no. 6, pp. 753–761, Dec. 2020, doi: 10.18280/ria.340609.
E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning Techniques,” Sensors, vol. 22, no. 13, p. 4670, Jun. 2022, doi: 10.3390/s22134670.
Md. M. Islam, S. Akter, Md. Rokunojjaman, J. H. Rony, A. Amin, and S. Kar, “Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique,” International Journal of Electronics and Communications Systems, vol. 1, no. 2, pp. 57–62, Dec. 2021, doi: 10.24042/ijecs.v1i2.10393.
A. Srinivas and J. P. Mosiganti, “A brain stroke detection model using soft voting based ensemble machine learning classifier,” Measurement: Sensors, vol. 29, p. 100871, Oct. 2023, doi: 10.1016/j.measen.2023.100871.
A. Semic and S. Karamehic, “Stroke Analysis and Prediction Using PySpark, Suport Vector Machine and Random Forest Regression,” International Journal of Data Science, vol. 3, no. 2, pp. 62-70, Sep. 2022.
O. Shobayo, O. Zachariah, M. O. Odusami, and B. Ogunleye, “Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm,” Analytics, vol. 2, no. 3, pp. 604–617, Aug. 2023, doi: 10.3390/analytics2030034.
Acknowledgements
We would like to thank Reviewers for taking the time and effort necessary to review the manuscript. We sincerely appreciate all valuable comments and suggestions, which helped us to improve the quality of the manuscript.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors would like to thank to the reviewers for nice comments on the
manuscript.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or
analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and
agreed to the published version of the manuscript.
Corresponding author
Sitanaboina S L Parvathi
Sitanaboina S L Parvathi
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India.
Open Access This article is licensed under a Creative Commons Attribution
NoDerivs is a more restrictive license. It allows you to redistribute the material commercially
or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no
derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Sitanaboina S L Parvathi, Aruna Devi B, Gururaj L Kulkarni, Sangeetha Murugan, Bindu Kolappa Pillai Vijayammal and Neha, “Exploring Feature Relationships in Brain Stroke Data Using Polynomial Feature Transformation and Linear Regression Modeling”, Journal of Machine and Computing, pp. 1158-1169, October 2024. doi:10.53759/7669/jmc202404107.