Journal of Machine and Computing


A Secure System for Accessing the Big Data Over the Scattered Cloud Information Center Using C-Hadoop



Journal of Machine and Computing

Received On : 16 April 2025

Revised On : 03 July 2025

Accepted On : 08 July 2025

Published On : 05 October 2025

Volume 05, Issue 04

Pages : 1971-1983


Abstract


The growth of the data worldwide is extremely fast and the data growth statistics and predictions are really worth consideration when infrastructure, storage and retrieval are involved. Increasing amount of data is set to reach 175 zettabytes by 2025 and also increasing 51% of this data will exist in data centers while the remainder of 49% is expected to be stored on the public cloud. Sadly, there is a split in the predictive formats, and it states 80% of it will remain in unstructured format. Ultimately, storing and retrieving such a big size of data is not possible without the MapReduce concept. The MapReduce (MR) model which works on big data is best suited to process existing medical data and fine tune the prediction systems. This work focuses on securely accessing the medical data (in this case diabetes data) over the cloud and the data are structured followed by feature extraction and clustering using modified naïve bayes classifier (MNBC) for building a better prediction system. Since this method utilized the MR model for processing, the final classification is done and verified against using k-fold validation techniques.


Keywords


Big Data Analytics, Healthcare Prediction, Diabetes Classification, Hadoop Mapreduce, Elliptic Curve Cryptography, Modified Naive Bayes Classifier, Feature Selection, Principal Component Analysis, Distributed Computing, Data Security.


  1. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.
  2. “Modern information retrieval: the concepts and technology behind search,” Choice Reviews Online, vol. 48, no. 12, pp. 48-6950-48–6950, Aug. 2011, doi: 10.5860/choice.48-6950.
  3. C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.
  4. “Mining the Web: Discovering Knowledge from Hypertext Data,” Online Information Review, vol. 27, no. 4, pp. 291–291, Aug. 2003, doi: 10.1108/14684520310489113.
  5. J. Zobel and A. Moffat, “Inverted files for text search engines,” ACM Computing Surveys, vol. 38, no. 2, p. 6, Jul. 2006, doi: 10.1145/1132956.1132959.
  6. R. Kosala and H. Blockeel, “Web mining research,” ACM SIGKDD Explorations Newsletter, vol. 2, no. 1, pp. 1–15, Jun. 2000, doi: 10.1145/360402.360406.
  7. S. Brin and L. Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1–7, pp. 107–117, Apr. 1998, doi: 10.1016/s0169-7552(98)00110-x.
  8. T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 133–142, Jul. 2002, doi: 10.1145/775047.775067.
  9. D. E. Losada and A. Barreiro, “A performance study of information retrieval techniques for web search,” ACM SIGIR Forum, vol. 36, no. 2, pp. 41–51, 2002.
  10. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering,” ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, Sep. 1999, doi: 10.1145/331499.331504.
  11. M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, Jan. 2014, doi: 10.1007/s11036-013-0489-0.
  12. J. Dean and S. Ghemawat, “MapReduce,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008, doi: 10.1145/1327452.1327492.
  13. S. S. Vani, R. P, Y. Nagendar, P. A. Prakash, M. Chanti, and A. Atheeswaran, “An Efficient Approach to Data Clustering Using the K-Means Algorithm in Big Data Analytics,” 2025 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), pp. 1–6, Jan. 2025, doi: 10.1109/iitcee64140.2025.10915388.
  14. T. T, A. Haldorai, S. G, and A. Sasi, “Hybrid Machine Learning Methodology for Real Time Quality of Service Prediction and Ideal Spectrum Selection in CRNs,” Journal of Machine and Computing, pp. 1265–1276, Apr. 2025, doi: 10.53759/7669/jmc202505099.
  15. Smith, J., & Johnson, A. "Efficient Big Data Processing in Healthcare Using Hadoop and MapReduce." Journal of Big Data in Healthcare, vol. 12, no. 3, pp. 45-60, 2024.
  16. Chen, L., & Wang, H. "Enhancing Security of Healthcare Data in Cloud with Elliptic Curve Cryptography." International Journal of Cloud Security, vol. 8, no. 2, pp. 112-125, 2024.
  17. Kumar, R., & Singh, S. "Feature Selection and Extraction Techniques for High-Dimensional Data in Big Data Analytics." IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 5, pp. 1890-1902, 2025.
  18. Brown, K., & Davis, M. "Naive Bayes Classifiers for Medical Diagnosis: A Comprehensive Study." Journal of Medical Systems, vol. 48, no. 1, pp. 1-15, 2024.
  19. Garcia, P., & Martinez, V. "Improving Naive Bayes: A Modified Approach for Handling Continuous Attributes." Machine Learning Journal, vol. 103, no. 2, pp. 321-335, 2025.
  20. Taylor, G., & Wilson, B. "Robust Model Evaluation in Big Data: The Role of k-Fold Cross-Validation." Data Mining and Knowledge Discovery, vol. 29, no. 4, pp. 876-890, 2024.
  21. K. Dutta, “Distributed Computing Technologies in Big Data Analytics,” Distributed Computing in Big Data Analytics, pp. 57–82, 2017, doi: 10.1007/978-3-319-59834-5_4.
  22. Roberts, S., & Green, T. "Unstructured Data in Healthcare: Challenges and Opportunities." Health Informatics Journal, vol. 30, no. 2, pp. 134-148, 2024.
  23. Lee, C., & Kim, D. "MapReduce-Based Clustering and Classification for Large-Scale Data." Journal of Parallel and Distributed Computing, vol. 145, pp. 1-12, 2025.
  24. Patel, S., & Williams, J. "Performance Metrics for Big Data Systems: A Comparative Study." IEEE Transactions on Parallel and Distributed Systems, vol. 35, no. 6, pp. 1234-1245, 2024.

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Selvam L, Thresa Jeniffer J, Pandi Maharajan M, Saravanan S, Jaiganesh M and Ramkumar S; Methodology: Selvam L, Thresa Jeniffer J and Pandi Maharajan M; Writing- Original Draft Preparation: Selvam L, Thresa Jeniffer J and Pandi Maharajan M; Supervision: Selvam L, Thresa Jeniffer J and Pandi Maharajan M; Writing- Reviewing and Editing: Selvam L, Thresa Jeniffer J, Pandi Maharajan M, Saravanan S, Jaiganesh M and Ramkumar S; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


We would like to thank Reviewers for taking the time and effort necessary to review the manuscript. We sincerely appreciate all valuable comments and suggestions, which helped us to improve the quality of the manuscript.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Selvam L, Thresa Jeniffer J, Pandi Maharajan M, Saravanan S, Jaiganesh M and Ramkumar S, “A Secure System for Accessing the Big Data Over the Scattered Cloud Information Center Using C-Hadoop”, Journal of Machine and Computing, vol.5, no.4, pp. 1971-1983, October 2025, doi: 10.53759/7669/jmc202505154.


Copyright


© 2025 Selvam L, Thresa Jeniffer J, Pandi Maharajan M, Saravanan S, Jaiganesh M and Ramkumar S. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.