The need for high-performance Data Mining (DM) algorithms is being driven by the exponentially increasing data availability such as images, audio and video from a variety of domains, including social networks and the Internet of Things (IoT). Deep learning is an emerging field of pattern recognition and Machine Learning (ML) study right now. It offers computer simulations of numerous nonlinear processing layers of neurons that may be used to learn and interpret data at higher degrees of abstractions. Deep learning models, which may be used in cloud technology and huge computational systems, can inherently capture complex structures of large data sets. Heterogeneousness is one of the most prominent characteristics of large data sets, and Heterogeneous Computing (HC) causes issues with system integration and Advanced Analytics. This article presents HC processing techniques, Big Data Analytics (BDA), large dataset instruments, and some classic ML and DM methodologies. The use of deep learning to Data Analytics is investigated. The benefits of integrating BDA, deep learning, HPC (High Performance Computing), and HC are highlighted. Data Analytics and coping with a wide range of data are discussed.
Keywords
Heterogeneous Computing (HC), Internet of Things (IoT), Big Data Analytics (BDA), Data Mining (DM), Machine Learning (ML).
M. V. Ngo, T. Luo, and T. Q. S. Quek, “Adaptive anomaly detection for internet of things in hierarchical edge computing: A contextual-bandit approach,” ACM Trans. Internet Things, vol. 3, no. 1, pp. 1–23, 2022.
Ramesh R., Udayakumar E., Srihari K., and Sunil Pathak P., “An innovative approach to solve healthcare issues using big data image analytics,” Int. j. big data anal. healthc., vol. 6, no. 1, pp. 15–25, 2021.
S. Song, F. Gao, A. Zhang, J. Wang, and P. S. Yu, “Stream data cleaning under speed and acceleration constraints,” ACM trans. database syst., vol. 46, no. 3, pp. 1–44, 2021.
The Mouse Phenotype Database Integration Consortium, “Integration of mouse phenome data resources,” Mamm. Genome, vol. 18, no. 11, pp. 815–815, 2007.
Y. Li, X. Yang, M. Zuo, Q. Jin, H. Li, and Q. Cao, “Deep structured learning for natural language processing,” ACM trans. Asian low-resour. lang. inf. process., vol. 20, no. 3, pp. 1–14, 2021.
A. J. Elmore, C. Curino, D. Agrawal, and A. El Abbadi, “Towards database virtualization for database as a service,” Proceedings VLDB Endowment, vol. 6, no. 11, pp. 1194–1195, 2013.
D. Bera, R. Pratap, and B. D. Verma, “Dimensionality Reduction for Categorical Data,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2021.
E. Civitelli, M. Lapucci, F. Schoen, and A. Sortino, “An effective procedure for feature subset selection in logistic regression based on information criteria,” Comput. Optim. Appl., vol. 80, no. 1, pp. 1–32, 2021.
S. Kusal, S. Patil, K. Kotecha, R. Aluvalu, and V. Varadarajan, “AI based emotion detection for textual big data: Techniques and contribution,” Big Data Cogn. Comput., vol. 5, no. 3, p. 43, 2021.
A. Arif, T. A. Alghamdi, Z. A. Khan, and N. Javaid, “Towards efficient energy utilization using big data analytics in smart cities for electricity theft detection,” Big Data Res., vol. 27, no. 100285, p. 100285, 2022.
T. Sun, M. Alles, and M. A. Vasarhelyi, “Adopting continuous auditing: A cross-sectional comparison between China and the United States,” Manag. Audit. J., vol. 30, no. 2, pp. 176–204, 2015.
A. N. Henderson, S. K. Kauwe, and T. D. Sparks, “Benchmark datasets incorporating diverse tasks, sample sizes, material systems, and data heterogeneity for materials informatics,” Data Brief, vol. 37, no. 107262, p. 107262, 2021.
S. Goutianos, “Fracture resistance dataset of composites under mixed-mode non-proportional loading,” Data Brief, vol. 39, no. 107668, p. 107668, 2021.
M. Damonte and E. Monti, “One semantic parser to parse them all: Sequence to sequence Multi-Task Learning on semantic parsing datasets,” arXiv [cs.CL], 2021.
Y. Yao, H. Gao, J. Wang, B. Sheng, and N. Mi, “New scheduling algorithms for improving performance and resource utilization in Hadoop YARN clusters,” IEEE trans. cloud comput., vol. 9, no. 3, pp. 1158–1171, 2021.
“Call for papers: Special issue on unlocking genetic diseases by integrating machine learning techniques and medical data,” Big Data Min. Anal., vol. 4, no. 3, pp. 221–221, 2021.
F. Stahl and I. Jordanov, “An overview of the use of neural networks for data mining tasks: Use of neural networks for data mining tasks,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 3, pp. 193–208, 2012.
X. Li, B. Yu, G. Feng, H. Wang, and W. Chen, “LotusSQL: SQL engine for high-performance big data systems,” Big Data Min. Anal., vol. 4, no. 4, pp. 252–265, 2021.
F. Harrou, A. Dairi, F. Kadri, and Y. Sun, “Effective forecasting of key features in hospital emergency department: Hybrid deep learning-driven methods,” Machine Learning with Applications, vol. 7, no. 100200, p. 100200, 2022.
Acknowledgements
Author(s) thanks to Dr.Jon Cotter for this research completion and support.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
No data available for above study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Steve Blair
Steve Blair
Department of Mathematics & Computing, Lander University, Greenwood, SC 29649, United States.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Steve Blair and Jon Cotter, “An Analysis of Data Processing for Big Data Analytics”, Journal of Computing and Natural Science, vol.1, no.4, pp. 130-138, October 2021. doi: 10.53759/181X/JCNS202101019.