#

Advances in Intelligent Systems and Technologies

Book Series

About the Book
About the Author
Table of Contents

Buy this Book

eBook
  • • Included format: Online and PDF
  • • eBooks can be used on all reading devices
  • • ISSN : 2959-3042
  • • ISBN : 978-9914-9946-1-2


Hardcover
  • • Including format: Hardcover
  • • Shipping Available for individuals worldwide
  • • ISSN : 2959-3034
  • • ISBN : 978-9914-9946-2-9


Services for the Book


Download Product Flyer
Download High-Resolutions Cover

International Conference on VLSI, Communication and Computer Communication

The Novel Method for Data Preprocessing CLI

Chithra Y, Prathibha Kiran, P B Manoj, Department of ECE, AMCEC, (Affiliated of VTU), Bangalore, India.


Online First : 06 December 2022
Publisher Name : AnaPub Publications, Kenya.
ISSN (Online) : 2959-3042
ISSN (Print) : 2959-3034
ISBN (Online) : 978-9914-9946-1-2
ISBN (Print) : 978-9914-9946-2-9
Pages : 117-120

Abstract


Data preprocessing is the first step in machine learning to ensure data quality and extract useful information from datasets. Derived data after data processing is used for model training and has a direct impact on model efficiency. If there is no relevant and dispensable information in the dataset, it will be removed from the dataset to ensure data quality. Data pre-processing includes description of data, null value handling, categorical value coding, normalization, transformation, extraction and selection of various features.

Keywords


Data preprocessing, Dataset, Machine learning.

  1. https://medium.com/analytics-vidhya/data-visualization-titanic-data-set -91531c3ab5a6
  2. https://www.researchgate.net/publication/228084519_Data_Preprocessi ng_for_Supervised_Learning.
  3. C. Cardie. Using decision trees to improve cased-based learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1995.
  4. Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1):9-37, 1998.
  5. Friedman, J.H. 1997. Data mining and statistics: What’s the connection? Proceedings of the 29th Symposium on the Interface Between Computer Science and Statistics.
  6. S. K. Dwivedi and B. Rawat, "A review paper on data preprocessing: A critical phase in web usage mining process," 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, pp. 506-510, doi: 10.1109/ICGCIoT.2015.7380517.
  7. Bauer, K.W., Alsing, S.G., Greene, K.A., 2000. Feature screening using signal-to-noise ratios. Neurocomputing 31, 29–44.
  8. M. Boulle. Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning 55:1 (2004) 53-69
  9. Breunig M. M., Kriegel H.-P., Ng R. T., Sander J.: ‘LOF: Identifying Density-Based Local Outliers’, Proc. ACM SIGMOD Int. Conf. On Management of Data (SIGMOD 2000), Dallas, TX, 2000, pp. 93-104.
  10. Brodley, C.E. and Friedl, M.A. (1999) "Identifying Mislabeled Training Data", AIR, Volume 11, pages 131-167.
  11. https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c 825
  12. https://medium.com/@yogeshojha/data-preprocessing-75485c7188c4
  13. J. Hua, Z. Xiong, J. Lowey, E. Suh, E.R. Dougherty. Optimal number of features as a function of sample size for various classification rules.Bioinformatics 21 (2005) 1509-1515
  14. Isabelle Guyon, André Elisseeff; An Introduction to Variable and Feature Selection, JMLR Special Issue on Variable and Feature Selection, 3(Mar):1157--1182, 2003.
  15. S. Das. Filters, wrappers and a boosting-based hybrid for feature selection. Proc. of the 8th International Conference on Machine Learning, 2001.

Cite this article


Chithra Y, Prathibha Kiran, P B Manoj, “The Novel Method for Data Preprocessing CLI”, Advances in Intelligent Systems and Technologies, pp. 117-120, December. 2022. doi: 10.53759/aist/978-9914-9946-1-2_21

Copyright


© 2023 Chithra Y, Prathibha Kiran, P B Manoj. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.