Journal of Computing and Natural Science


An Analysis of Data Processing for Big Data Analytics



Journal of Computing and Natural Science

Received On : 10 April 2021

Revised On : 22 May 2021

Accepted On : 06 July 2021

Published On : 05 October 2021

Volume 01, Issue 04

Pages : 130-138


Abstract


The need for high-performance Data Mining (DM) algorithms is being driven by the exponentially increasing data availability such as images, audio and video from a variety of domains, including social networks and the Internet of Things (IoT). Deep learning is an emerging field of pattern recognition and Machine Learning (ML) study right now. It offers computer simulations of numerous nonlinear processing layers of neurons that may be used to learn and interpret data at higher degrees of abstractions. Deep learning models, which may be used in cloud technology and huge computational systems, can inherently capture complex structures of large data sets. Heterogeneousness is one of the most prominent characteristics of large data sets, and Heterogeneous Computing (HC) causes issues with system integration and Advanced Analytics. This article presents HC processing techniques, Big Data Analytics (BDA), large dataset instruments, and some classic ML and DM methodologies. The use of deep learning to Data Analytics is investigated. The benefits of integrating BDA, deep learning, HPC (High Performance Computing), and HC are highlighted. Data Analytics and coping with a wide range of data are discussed.


Keywords


Heterogeneous Computing (HC), Internet of Things (IoT), Big Data Analytics (BDA), Data Mining (DM), Machine Learning (ML).


  1. M. V. Ngo, T. Luo, and T. Q. S. Quek, “Adaptive anomaly detection for internet of things in hierarchical edge computing: A contextual-bandit approach,” ACM Trans. Internet Things, vol. 3, no. 1, pp. 1–23, 2022.
  2. Ramesh R., Udayakumar E., Srihari K., and Sunil Pathak P., “An innovative approach to solve healthcare issues using big data image analytics,” Int. j. big data anal. healthc., vol. 6, no. 1, pp. 15–25, 2021.
  3. S. Song, F. Gao, A. Zhang, J. Wang, and P. S. Yu, “Stream data cleaning under speed and acceleration constraints,” ACM trans. database syst., vol. 46, no. 3, pp. 1–44, 2021.
  4. The Mouse Phenotype Database Integration Consortium, “Integration of mouse phenome data resources,” Mamm. Genome, vol. 18, no. 11, pp. 815–815, 2007.
  5. Y. Li, X. Yang, M. Zuo, Q. Jin, H. Li, and Q. Cao, “Deep structured learning for natural language processing,” ACM trans. Asian low-resour. lang. inf. process., vol. 20, no. 3, pp. 1–14, 2021.
  6. A. J. Elmore, C. Curino, D. Agrawal, and A. El Abbadi, “Towards database virtualization for database as a service,” Proceedings VLDB Endowment, vol. 6, no. 11, pp. 1194–1195, 2013.
  7. D. Bera, R. Pratap, and B. D. Verma, “Dimensionality Reduction for Categorical Data,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2021.
  8. E. Civitelli, M. Lapucci, F. Schoen, and A. Sortino, “An effective procedure for feature subset selection in logistic regression based on information criteria,” Comput. Optim. Appl., vol. 80, no. 1, pp. 1–32, 2021.
  9. S. Kusal, S. Patil, K. Kotecha, R. Aluvalu, and V. Varadarajan, “AI based emotion detection for textual big data: Techniques and contribution,” Big Data Cogn. Comput., vol. 5, no. 3, p. 43, 2021.
  10. A. Arif, T. A. Alghamdi, Z. A. Khan, and N. Javaid, “Towards efficient energy utilization using big data analytics in smart cities for electricity theft detection,” Big Data Res., vol. 27, no. 100285, p. 100285, 2022.
  11. T. Sun, M. Alles, and M. A. Vasarhelyi, “Adopting continuous auditing: A cross-sectional comparison between China and the United States,” Manag. Audit. J., vol. 30, no. 2, pp. 176–204, 2015.
  12. A. N. Henderson, S. K. Kauwe, and T. D. Sparks, “Benchmark datasets incorporating diverse tasks, sample sizes, material systems, and data heterogeneity for materials informatics,” Data Brief, vol. 37, no. 107262, p. 107262, 2021.
  13. S. Goutianos, “Fracture resistance dataset of composites under mixed-mode non-proportional loading,” Data Brief, vol. 39, no. 107668, p. 107668, 2021.
  14. M. Damonte and E. Monti, “One semantic parser to parse them all: Sequence to sequence Multi-Task Learning on semantic parsing datasets,” arXiv [cs.CL], 2021.
  15. Y. Yao, H. Gao, J. Wang, B. Sheng, and N. Mi, “New scheduling algorithms for improving performance and resource utilization in Hadoop YARN clusters,” IEEE trans. cloud comput., vol. 9, no. 3, pp. 1158–1171, 2021.
  16. “Call for papers: Special issue on unlocking genetic diseases by integrating machine learning techniques and medical data,” Big Data Min. Anal., vol. 4, no. 3, pp. 221–221, 2021.
  17. F. Stahl and I. Jordanov, “An overview of the use of neural networks for data mining tasks: Use of neural networks for data mining tasks,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 3, pp. 193–208, 2012.
  18. X. Li, B. Yu, G. Feng, H. Wang, and W. Chen, “LotusSQL: SQL engine for high-performance big data systems,” Big Data Min. Anal., vol. 4, no. 4, pp. 252–265, 2021.
  19. F. Harrou, A. Dairi, F. Kadri, and Y. Sun, “Effective forecasting of key features in hospital emergency department: Hybrid deep learning-driven methods,” Machine Learning with Applications, vol. 7, no. 100200, p. 100200, 2022.

Acknowledgements


Author(s) thanks to Dr.Jon Cotter for this research completion and support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


No data available for above study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Steve Blair and Jon Cotter, “An Analysis of Data Processing for Big Data Analytics”, Journal of Computing and Natural Science, vol.1, no.4, pp. 130-138, October 2021. doi: 10.53759/181X/JCNS202101019.


Copyright


© 2021 Steve Blair and Jon Cotter. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.