2nd International Conference on Materials Science and Sustainable Manufacturing Technology
Data Validation using ETL – A Theoretical Perspective
Sanjay Kumar. S, Roshaan. J. S, Surya.V, Srinivas. S, Sreemathy.J, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore, India.
Data is more readily available than ever, but when it is erroneous or insufficient, it can be challenging to
interpret. As a result, data validation is essential to improving the quality of data for sound decision-making. The authors
have discussed some of the most important concepts and challenges in data validation. It is obvious that human oversight
cannot be completely removed from this process. Information priceless human qualities that cannot be taught. Humans
are still cautious to take action on decisions that have not been validated by another person, despite today's highly
advanced data validation technology or automated approaches. A data validation dashboard can be used by an expert data
practitioner to monitor the complete data analysis procedure. The dashboard may make it easier for teams or project
managers to assign tasks and resources while also more efficiently monitoring the progress and success of their work.
This paper offers insightful discussions of the fundamental ideas, key points, and validation procedure for data validation
and quality assurance. Additionally, the article compares several data validation technologies, and several significant
industry players are explored. Additionally, the main problems, difficulties, and requirements are explored.
Keywords
Data Validation, Component, ETL, Test, Data Warehouse.
Koren and M. Jurcevic, “Concept-Level Model of Integrated Syntax and Semantic Validation for Internet of Medical Things Data,” 2021 IEEE 15th International Conference on Semantic Computing (ICSC), Jan. 2021, doi: 10.1109/icsc50631.2021.000
K. Oyoo, “Automatic Data Validation and Testing for Enterprise Asset Management in the Power and Utilities industry,” SoutheastCon 2021, Mar. 2021, doi: 10.1109/southeastcon45413.2021.9401912.
P. Yang et al., “Lifelogging Data Validation Model for Internet of Things Enabled Personalized Healthcare,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 1, pp. 50–64, Jan. 2018, doi: 10.1109/tsmc.2016.2586075.
M. Kim and J. Yun, “Data Reliability Enhancement Method through Data Validation in Crowdsensing System,” 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN), Jul. 2019, doi: 10.1109/icufn.2019.8806104.
G. Zhang, “A data traceability method to improve data quality in a big data environment,” 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Jul. 2020, doi: 10.1109/dsc50466.2020.00051.
J. Ding, X.-H. Hu, and V. Gudivada, “A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data,” IEEE Transactions on Big Data, vol. 7, no. 2, pp. 451–467, Jun. 2021, doi: 10.1109/tbdata.2017.2680460.
Y. Chung, T. Kraska, N. Polyzotis, K. H. Tae, and S. E. Whang, “Slice Finder: Automated Data Slicing for Model Validation,” 2 019 IEEE 35th International Conference on Data Engineering (ICDE), Apr. 2019, doi: 10.1109/icde.2019.00139.
Ibrahim, Ermatita, Saparudin, and Z. Adetya, “Analysis of weakness of data validation from social CRM,” 2017 International Conference on Data and Software Engineering (ICoDSE), Nov. 2017, doi: 10.1109/icodse.2017.8285849.
J. Gao, C. Xie, and C. Tao, “Big Data Validation and Quality Assurance -- Issuses, Challenges, and Needs,” 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), Mar. 2016, doi: 10.1109/sose.2016.63.
Y. Yang, S. Kang, and J. Seo, “Improved Machine Reading Comprehension Using Data Validation for Weakly Labeled Data,” IEEE Access, vol. 8, pp. 5667–5677, 2020, doi: 10.1109/access.2019.2963569.
Cite this article
Sanjay Kumar. S, Roshaan. J. S, Surya.V, Srinivas. S, Sreemathy.J, “Data Validation using ETL – A Theoretical Perspective”, Advances in Computational Intelligence in Materials Science, pp. 007-011, May. 2023. doi:10.53759/acims/978-9914-9946-9-8_2