Data preprocessing is the first step in machine learning to ensure data quality and extract useful information from datasets. Derived data after data processing is used for model training and has a direct impact on model efficiency. If there is no relevant and dispensable information in the dataset, it will be removed from the dataset to ensure data quality. Data pre-processing includes description of data, null value handling, categorical value coding, normalization, transformation, extraction and selection of various features.
C. Cardie. Using decision trees to improve cased-based learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1995.
Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1):9-37, 1998.
Friedman, J.H. 1997. Data mining and statistics: What’s the connection? Proceedings of the 29th Symposium on the Interface Between Computer Science and Statistics.
S. K. Dwivedi and B. Rawat, "A review paper on data preprocessing: A critical phase in web usage mining process," 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, pp. 506-510, doi: 10.1109/ICGCIoT.2015.7380517.
M. Boulle. Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning 55:1 (2004) 53-69
Breunig M. M., Kriegel H.-P., Ng R. T., Sander J.: ‘LOF: Identifying Density-Based Local Outliers’, Proc. ACM SIGMOD Int. Conf. On Management of Data (SIGMOD 2000), Dallas, TX, 2000, pp. 93-104.
Brodley, C.E. and Friedl, M.A. (1999) "Identifying Mislabeled Training Data", AIR, Volume 11, pages 131-167.
J. Hua, Z. Xiong, J. Lowey, E. Suh, E.R. Dougherty. Optimal number of features as a function of sample size for various classification rules.Bioinformatics 21 (2005) 1509-1515
Isabelle Guyon, André Elisseeff; An Introduction to Variable and Feature Selection, JMLR Special Issue on Variable and Feature Selection, 3(Mar):1157--1182, 2003.
S. Das. Filters, wrappers and a boosting-based hybrid for feature selection. Proc. of the 8th International Conference on Machine Learning, 2001.
Cite this article
Chithra Y, Prathibha Kiran, P B Manoj, “The Novel Method for Data Preprocessing CLI”, Advances in Intelligent Systems and Technologies, pp. 117-120, December. 2022. doi: 10.53759/aist/978-9914-9946-1-2_21