Journal of Machine and Computing


Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection



Journal of Machine and Computing

Received On : 05 June 2024

Revised On : 16 February 2025

Accepted On : 08 May 2025

Published On : 05 July 2025

Volume 05, Issue 03

Pages : 1439-1449


Abstract


Acquiring real-time, accurate, large datasets is crucial and time-consuming for specific problems. Numerous datasets are available with annotations, but most are not feasible for a special task because of differences in the class label, class imbalance, and variability. One such solution to this problem is to use artificially crafted datasets (or synthetic datasets), which are scalable and can be automatically annotated. We utilized two different approaches—stable diffusion and cut-paste-blend—to generate a synthetic dataset. This study investigates the use of synthetic image datasets to observe the performance of YOLOv8 and improved YOLOv8 models for helmet detection. We trained models on both real-world and synthetic datasets and evaluated their performance in terms of detection accuracy. After training 50 epochs, the model achieved a mAP@50 of 78.6% on real data, 45.5% on synthetic data, and 75.4% on hybrid datasets. We analyzed how the hybrid dataset affected results using different ratios and discovered that with a 3:1 mix of hybrid data, the YOLOv8-based model reached an mAP@50 of 90.3%, which is better than when real and synthetic data were used in equal amounts. We proposed the Convolutional Block Attention Module-based YOLOv8-CBAM to enhance the accuracy of helmet and non-helmet detection. Experimental results indicate that YOLOv8-CBAM achieved an mAP@50 of 91% at 50, which is 0.7% better than the baseline model. This study also indicates that the correct proportion of synthetic datasets solved the class imbalance problem and improved the helmet detection accuracy in challenging environments.


Keywords


Synthetic Data, YOLO, Attention Mechanism, Deep Learning, Helmet Detection.


  1. M. Giuffrè and D. L. Shung, “Harnessing the power of synthetic data in healthcare: innovation, application, and privacy,” npj Digital Medicine, vol. 6, no. 1, Oct. 2023, doi: 10.1038/s41746-023-00927-3.
  2. N. Giakoumoglou, E. M. Pechlivani, and D. Tzovaras, “Generate-Paste-Blend-Detect: Synthetic dataset for object detection in the agriculture domain,” Smart Agricultural Technology, vol. 5, p. 100258, Oct. 2023, doi: 10.1016/j.atech.2023.100258.
  3. G. Delgado, A. Cortés, S. García, E. Loyo, M. Berasategi, and N. Aranjuelo, “Methodology for generating synthetic labeled datasets for visual container inspection,” Transportation Research Part E: Logistics and Transportation Review, vol. 175, p. 103174, Jul. 2023, doi: 10.1016/j.tre.2023.103174.
  4. M. Goyal and Q. H. Mahmoud, “A Systematic Review of Synthetic Data Generation Techniques Using Generative AI,” Electronics, vol. 13, no. 17, p. 3509, Sep. 2024, doi: 10.3390/electronics13173509.
  5. A. Kniazev., P. Slivnitsin, L. Mylnikov, S. Schlechtweg, “Perm National Research Polytechnic University, & Anhalt University of Applied Sciences. (2021)”. Influence of synthetic image datasets on the result of neural networks for object detection. Proc. Of the 9th International Conference on Applied Innovations in IT, 55.
  6. M. G. Ljungqvist, O. Nordander, M. Skans, A. Mildner, T. Liu, and P. Nugues, “Object Detector Differences when Using Synthetic and Real Training Data,” SN Computer Science, vol. 4, no. 3, Mar. 2023, doi: 10.1007/s42979-023-01704-5.
  7. J. Kim, I. Wang, and J. Yu, “Experimental Study on Using Synthetic Images as a Portion of Training Dataset for Object Recognition in Construction Site,” Buildings, vol. 14, no. 5, p. 1454, May 2024, doi: 10.3390/buildings14051454.
  8. Y. Wang, W. Deng, Z. Liu, and J. Wang, “Deep learning‐based vehicle detection with synthetic image data,” IET Intelligent Transport Systems, vol. 13, no. 7, pp. 1097–1105, Mar. 2019, doi: 10.1049/iet-its.2018.5365.
  9. V. Shakhuro, B. Faizov, and A. Konushin, “Rare Traffic Sign Recognition using Synthetic Training Data,” Proceedings of the 3rd International Conference on Video and Image Processing, pp. 23–26, Dec. 2019, doi: 10.1145/3376067.3376105.
  10. Q. Zhou, J. Qin, X. Xiang, Y. Tan, and N. N. Xiong, “Algorithm of Helmet Wearing Detection Based on AT-YOLO Deep Mode,” Computers, Materials & Continua, vol. 69, no. 1, pp. 159–174, 2021, doi: 10.32604/cmc.2021.017480.
  11. C. Shan, H. Liu, and Y. Yu, “Research on improved algorithm for helmet detection based on YOLOv5,” Scientific Reports, vol. 13, no. 1, Oct. 2023, doi: 10.1038/s41598-023-45383-x.
  12. Z. P. Xu, Y. Zhang, J. Cheng, and G. Ge, “Safety Helmet Wearing Detection Based on YOLOv5 of Attention Mechanism,” Journal of Physics: Conference Series, vol. 2213, no. 1, p. 012038, Mar. 2022, doi: 10.1088/1742-6596/2213/1/012038.
  13. B. Lin, “Safety Helmet Detection Based on Improved YOLOv8,” IEEE Access, vol. 12, pp. 28260–28272, 2024, doi: 10.1109/access.2024.3368161.
  14. K. Patil, R. Jadhav, Y. Suryawanshi, P. Chumchu, G. Khare, and T. Shinde, “HelmetML: A dataset of helmet images for machine learning applications,” Data in Brief, vol. 56, p. 110790, Oct. 2024, doi: 10.1016/j.dib.2024.110790.
  15. Helmet detection. (2020). [Dataset]. “In Helmet detection. Kaggle”. https://www.kaggle.com/datasets/andrewmvd/helmet-detection .
  16. T. Thamaraimanalan, P. G. Vishnu, R. Dineshkumar, A. Dayanand, S. M. Shahil, and N. Ashokkumar, “Prevention of Road Accidents Using Hybrid Machine Learning Algorithm,” 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 2137–2143, Mar. 2024, doi: 10.1109/icaccs60874.2024.10717245.
  17. M.-E. Otgonbold et al., “SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection,” Sensors, vol. 22, no. 6, p. 2315, Mar. 2022, doi: 10.3390/s22062315.
  18. Xie, L. (2019). “Hardhat [Dataset]. In Harvard Dataverse”. https://doi.org/10.7910/dvn/7cbgos.
  19. I. Reutov, “Generating of synthetic datasets using diffusion models for solving computer vision tasks in urban applications,” Procedia Computer Science, vol. 229, pp. 335–344, 2023, doi: 10.1016/j.procs.2023.12.036.
  20. V. C. Pezoulas et al., “Synthetic data generation methods in healthcare: A review on open-source tools and methods,” Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec. 2024, doi: 10.1016/j.csbj.2024.07.005.
  21. R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On The Detection of Synthetic Images Generated by Diffusion Models,” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, Jun. 2023, doi: 10.1109/icassp49357.2023.10095167.
  22. C.-T. Chien, R.-Y. Ju, K.-Y. Chou, E. Xieerke, and J.-S. Chiang, “YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,” IEEE Access, vol. 13, pp. 52461–52477, 2025, doi: 10.1109/access.2025.3549839.
  23. S. Woo, J. Park, J. Lee, & Kweon, I. S. (2018). CBAM:” Convolutional Block Attention Module. In Lecture notes in computer science (pp. 3–19)”. https://doi.org/10.1007/978-3-030-01234-2_1.
  24. L. Zhang, H. Ma, J. Huang, C. Zhang, and X. Gao, “An Improved Lightweight Safety Helmet Detection Algorithm for YOLOv8,” Computers, Materials & Continua, vol. 83, no. 2, pp. 2245–2265, 2025, doi: 10.32604/cmc.2025.061519.

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Arshad M, Kumar P; Methodology: Arshad M, Kumar P; Software: Arshad M; Data Curation: Arshad M; Writing- Original Draft Preparation: Arshad M, Kumar P; Visualization: Arshad M; Investigation: Kumar P; Supervision: Kumar P; Validation: Arshad M, Kumar P; Writing- Reviewing and Editing: All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


Author(s) thanks to Dr. Pradeep Kumar for this research completion and support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Mohd Arshad and Pradeep Kumar, “Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection”, Journal of Machine and Computing, vol.5, no.3, pp. 1439-1449, July 2025, doi: 10.53759/7669/jmc202505114.


Copyright


© 2025 Mohd Arshad and Pradeep Kumar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.