#

Advances in Intelligent Systems and Technologies

Book Series

About the Book
About the Author
Table of Contents

Buy this Book

eBook
  • • Included format: Online and PDF
  • • eBooks can be used on all reading devices
  • • ISSN : 2959-3042
  • • ISBN : 978-9914-9946-0-5


Hardcover
  • • Including format: Hardcover
  • • Shipping Available for individuals worldwide
  • • ISSN : 2959-3034
  • • ISBN : 978-9914-9946-3-6


Services for the Book

Download Product Flyer
Download High-Resolutions Cover

First International Conference on Machines, Computing and Management Technologies

Analyzing the Impact of Ensemble Techniques and Resampling Techniques Over Multi Class Skewed Datasets

Rose Mary Mathew and R Gunasundari, Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore, India.


Online First : 30 July 2022
Publisher Name : AnaPub Publications, Kenya.
ISSN (Online) : 2959-3042
ISSN (Print) : 2959-3034
ISBN (Online) : 978-9914-9946-0-5
ISBN (Print) : 978-9914-9946-3-6
Pages : 001-013

Abstract


Machine Learning is having great importance in this era, since of its board spectrum of applications and its capability to adjust and give solutions to complex problems reliably, rapidly, and productively. Machine learning models trained with the data from past experiences and based on the learned data it produces outcomes. The data used for training with these machine learning models should be in balanced manner otherwise the model gives incorrect results. Data is having an important role in this scenario, and it is evident that most of the data are skewed towards some classes and this kind of skewness can be found in all sectors of data in real world. Multimajority datasets and multiminority datasets are the different types of imbalances viewed in multiclass datasets. In this study three different datasets from multimajority domain and three different datasets from multiminority domain are analysed. Six different resampling procedure were applied out of which three belongs to undersampling and three belongs to oversampling. Four different classifiers K-NN, SVM, Random Forest and XGBoost were used to create the various models and their performance were analysed in this study.

Keywords


Imbalanced, Multiclass, Multimajority, Multiminority Oversampling, Undersampling

  1. S. Vluymans, “Learning from imbalanced data,” in Studies in Computational Intelligence, vol. 807, Springer Verlag, 2019, pp. 81–110.
  2. G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73. Elsevier Ltd, pp. 220–239, May 01, 2017, doi: 10.1016/j.eswa.2016.12.035.
  3. S. Wang and X. Yao, “Multiclass imbalance problems: Analysis and potential solutions,” IEEE Trans. Syst. Man, Cybern. Part B Cybern.,vol. 42, no. 4, pp. 1119–1130, 2012, doi: 10.1109/TSMCB.2012.2187280.
  4. Y. Pristyanto, I. Pratama, and A. F. Nugraha, “Data level approach for imbalanced class handling on educational data mining multiclassclassification,” in 2018 International Conference on Information and Communications Technology, ICOIACT 2018, 2018, vol. 2018-Janua, doi:10.1109/ICOIACT.2018.8350792.
  5. R. M. Mathew and R.Gunasundari, “A review on handling multiclass imbalanced data classification in education domain,” in 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE, 2021, pp. 752–755, doi:10.1109/ICACITE51222.2021.9404626.
  6. J. Alcalá-Fdez et al., “KEEL: A software tool to assess evolutionary algorithms for data mining problems,” Soft Comput., vol. 13, no. 3, pp.307–318, 2009, doi: 10.1007/s00500-008-0323-y.
  7. J. Alcalá-Fdez et al., “KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” J. Mult. Log. Soft Comput., vol. 17, no. 2–3, pp. 255–287, 2011.
  8. V. S. Spelmen and R. Porkodi, “A Review on Handling Imbalanced Data,” Proc. 2018 Int. Conf. Curr. Trends Towar. Converging Technol.ICCTCT 2018, no. December, pp. 1–11, 2018, doi: 10.1109/ICCTCT.2018.8551020.
  9. V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Inf. Sci. (Ny)., vol. 250, pp. 113–141, 2013, doi: 10.1016/j.ins.2013.07.007.
  10. R. M. Mathew and R.Gunasundari, “AN EXPERIMENTAL STUDY ON THE EFFECT OF RESAMPLING TECHNIQUES IN MULTICLASS IMBALANCED DATA IN LEARNING SECTOR,” Des. Eng., no. 8, pp. 16216–16231, 2021, [Online]. Available:http://www.thedesignengineering.com/index.php/DE/article/view/6768.
  11. A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research, vol. 61. AI Access Foundation, pp. 863–905, Apr. 01, 2018, doi:10.1613/jair.1.11192.
  12. A. De and N. Do, “Techniques to deal with imbalanced data in multi-class problems : A review of existing methods,” 2020.
  13. X. Ai, J. Wu, V. S. Sheng, P. Zhao, and Z. Cui, “Immune centroids oversampling method for binary classification,” Comput. Intell. Neurosci., vol. 2015, 2015, doi: 10.1155/2015/109806.
  14. Y. Pristyanto, N. A. Setiawan, and I. Ardiyanto, “Hybrid resampling to handle imbalanced class on classification of student performance in classroom,” Proc. - 2017 1st Int. Conf. Informatics Comput. Sci. ICICoS 2017, vol. 2018-Janua, pp. 207–212, 2017, doi:10.1109/ICICOS.2017.8276363.
  15. B. S. Raghuwanshi and S. Shukla, “Class imbalance learning using UnderBagging based kernelized extreme learning machine,” Neurocomputing, vol. 329, pp. 172–187, Feb. 2019, doi: 10.1016/j.neucom.2018.10.056.
  16. X. Li, S. Wu, X. Li, H. Yuan, and D. Zhao, “Particle Swarm Optimization-Support Vector Machine Model for Machinery Fault Diagnoses in High-Voltage Circuit Breakers,” J. Mech. Eng, vol. 33, p. 6, 2020, doi: 10.1186/s10033-019-0428-5.
  17. Y. Pristyanto, A. F. Nugraha, I. Pratama, and A. Dahlan, “Ensemble Model Approach for Imbalanced Class Handling on Dataset,” 2020 3rd Int. Conf. Inf. Commun. Technol. ICOIACT 2020, pp. 17–21, 2020, doi: 10.1109/ICOIACT50329.2020.9331984.
  18. S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Inf. Fusion, vol. 64, no. May, pp. 205–237, 2020, doi: 10.1016/j.inffus.2020.07.007.
  19. E. Mortaz, “Imbalance accuracy metric for model selection in multi-class imbalance classification problems,” Knowledge-Based Syst., vol.210, Dec. 2020, doi: 10.1016/j.knosys.2020.106490.
  20. R. M. Mathew and R.Gunasundari, “Techniques and Tools to Tackle Imbalanced Learning,” Karpagam J. Comput. Sci., vol. 16, no. 3 May-June2021, 2021, [Online]. Available: https://karpagampublications.com/archives-kjcs/paper-list-may-june-2021/.
  21. Available at https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/
  22. Available at https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm
  23. Available at https://www.analyticsvidhya.com/blog/2020/05/decision-tree-vs-random-forest-algorithm/
  24. Available at https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-HowItWorks.html

Cite this article


Rose Mary Mathew and R Gunasundari, “Analyzing the Impact of Ensemble Techniques and Resampling Techniques Over Multi Class Skewed Datasets”, Advances in Intelligent Systems and Technologies, pp. 001-013, July. 2022. doi:10.53759/aist/978-9914-9946-0-5_1

Copyright


© 2023 Rose Mary Mathew and R Gunasundari. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.