AnaPub Publications

Journal

Frequency: Quarterly

ISSN (Online) : 2788-7669

ISSN (Print) : 2789-1801

Journal of Machine and Computing

Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection

Journal of Machine and Computing

Received On : 05 June 2024

Revised On : 16 February 2025

Accepted On : 08 May 2025

Published On : 05 July 2025

Volume 05, Issue 03

Pages : 1439-1449

DOI

https://doi.org/10.53759/7669/jmc202505114

Article Views

Abstract

Acquiring real-time, accurate, large datasets is crucial and time-consuming for specific problems. Numerous datasets are available with annotations, but most are not feasible for a special task because of differences in the class label, class imbalance, and variability. One such solution to this problem is to use artificially crafted datasets (or synthetic datasets), which are scalable and can be automatically annotated. We utilized two different approaches—stable diffusion and cut-paste-blend—to generate a synthetic dataset. This study investigates the use of synthetic image datasets to observe the performance of YOLOv8 and improved YOLOv8 models for helmet detection. We trained models on both real-world and synthetic datasets and evaluated their performance in terms of detection accuracy. After training 50 epochs, the model achieved a mAP@50 of 78.6% on real data, 45.5% on synthetic data, and 75.4% on hybrid datasets. We analyzed how the hybrid dataset affected results using different ratios and discovered that with a 3:1 mix of hybrid data, the YOLOv8-based model reached an mAP@50 of 90.3%, which is better than when real and synthetic data were used in equal amounts. We proposed the Convolutional Block Attention Module-based YOLOv8-CBAM to enhance the accuracy of helmet and non-helmet detection. Experimental results indicate that YOLOv8-CBAM achieved an mAP@50 of 91% at 50, which is 0.7% better than the baseline model. This study also indicates that the correct proportion of synthetic datasets solved the class imbalance problem and improved the helmet detection accuracy in challenging environments.

Keywords

Synthetic Data, YOLO, Attention Mechanism, Deep Learning, Helmet Detection.

References

M. Giuffrè and D. L. Shung, “Harnessing the power of synthetic data in healthcare: innovation, application, and privacy,” npj Digital Medicine, vol. 6, no. 1, Oct. 2023, doi: 10.1038/s41746-023-00927-3.
N. Giakoumoglou, E. M. Pechlivani, and D. Tzovaras, “Generate-Paste-Blend-Detect: Synthetic dataset for object detection in the agriculture domain,” Smart Agricultural Technology, vol. 5, p. 100258, Oct. 2023, doi: 10.1016/j.atech.2023.100258.
G. Delgado, A. Cortés, S. García, E. Loyo, M. Berasategi, and N. Aranjuelo, “Methodology for generating synthetic labeled datasets for visual container inspection,” Transportation Research Part E: Logistics and Transportation Review, vol. 175, p. 103174, Jul. 2023, doi: 10.1016/j.tre.2023.103174.
M. Goyal and Q. H. Mahmoud, “A Systematic Review of Synthetic Data Generation Techniques Using Generative AI,” Electronics, vol. 13, no. 17, p. 3509, Sep. 2024, doi: 10.3390/electronics13173509.
A. Kniazev., P. Slivnitsin, L. Mylnikov, S. Schlechtweg, “Perm National Research Polytechnic University, & Anhalt University of Applied Sciences. (2021)”. Influence of synthetic image datasets on the result of neural networks for object detection. Proc. Of the 9th International Conference on Applied Innovations in IT, 55.
M. G. Ljungqvist, O. Nordander, M. Skans, A. Mildner, T. Liu, and P. Nugues, “Object Detector Differences when Using Synthetic and Real Training Data,” SN Computer Science, vol. 4, no. 3, Mar. 2023, doi: 10.1007/s42979-023-01704-5.
J. Kim, I. Wang, and J. Yu, “Experimental Study on Using Synthetic Images as a Portion of Training Dataset for Object Recognition in Construction Site,” Buildings, vol. 14, no. 5, p. 1454, May 2024, doi: 10.3390/buildings14051454.
Y. Wang, W. Deng, Z. Liu, and J. Wang, “Deep learning‐based vehicle detection with synthetic image data,” IET Intelligent Transport Systems, vol. 13, no. 7, pp. 1097–1105, Mar. 2019, doi: 10.1049/iet-its.2018.5365.
V. Shakhuro, B. Faizov, and A. Konushin, “Rare Traffic Sign Recognition using Synthetic Training Data,” Proceedings of the 3rd International Conference on Video and Image Processing, pp. 23–26, Dec. 2019, doi: 10.1145/3376067.3376105.
Q. Zhou, J. Qin, X. Xiang, Y. Tan, and N. N. Xiong, “Algorithm of Helmet Wearing Detection Based on AT-YOLO Deep Mode,” Computers, Materials & Continua, vol. 69, no. 1, pp. 159–174, 2021, doi: 10.32604/cmc.2021.017480.
C. Shan, H. Liu, and Y. Yu, “Research on improved algorithm for helmet detection based on YOLOv5,” Scientific Reports, vol. 13, no. 1, Oct. 2023, doi: 10.1038/s41598-023-45383-x.
Z. P. Xu, Y. Zhang, J. Cheng, and G. Ge, “Safety Helmet Wearing Detection Based on YOLOv5 of Attention Mechanism,” Journal of Physics: Conference Series, vol. 2213, no. 1, p. 012038, Mar. 2022, doi: 10.1088/1742-6596/2213/1/012038.
B. Lin, “Safety Helmet Detection Based on Improved YOLOv8,” IEEE Access, vol. 12, pp. 28260–28272, 2024, doi: 10.1109/access.2024.3368161.
K. Patil, R. Jadhav, Y. Suryawanshi, P. Chumchu, G. Khare, and T. Shinde, “HelmetML: A dataset of helmet images for machine learning applications,” Data in Brief, vol. 56, p. 110790, Oct. 2024, doi: 10.1016/j.dib.2024.110790.
Helmet detection. (2020). [Dataset]. “In Helmet detection. Kaggle”. https://www.kaggle.com/datasets/andrewmvd/helmet-detection .
T. Thamaraimanalan, P. G. Vishnu, R. Dineshkumar, A. Dayanand, S. M. Shahil, and N. Ashokkumar, “Prevention of Road Accidents Using Hybrid Machine Learning Algorithm,” 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 2137–2143, Mar. 2024, doi: 10.1109/icaccs60874.2024.10717245.
M.-E. Otgonbold et al., “SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection,” Sensors, vol. 22, no. 6, p. 2315, Mar. 2022, doi: 10.3390/s22062315.
Xie, L. (2019). “Hardhat [Dataset]. In Harvard Dataverse”. https://doi.org/10.7910/dvn/7cbgos.
I. Reutov, “Generating of synthetic datasets using diffusion models for solving computer vision tasks in urban applications,” Procedia Computer Science, vol. 229, pp. 335–344, 2023, doi: 10.1016/j.procs.2023.12.036.
V. C. Pezoulas et al., “Synthetic data generation methods in healthcare: A review on open-source tools and methods,” Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec. 2024, doi: 10.1016/j.csbj.2024.07.005.
R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On The Detection of Synthetic Images Generated by Diffusion Models,” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, Jun. 2023, doi: 10.1109/icassp49357.2023.10095167.
C.-T. Chien, R.-Y. Ju, K.-Y. Chou, E. Xieerke, and J.-S. Chiang, “YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,” IEEE Access, vol. 13, pp. 52461–52477, 2025, doi: 10.1109/access.2025.3549839.
S. Woo, J. Park, J. Lee, & Kweon, I. S. (2018). CBAM:” Convolutional Block Attention Module. In Lecture notes in computer science (pp. 3–19)”. https://doi.org/10.1007/978-3-030-01234-2_1.
L. Zhang, H. Ma, J. Huang, C. Zhang, and X. Gao, “An Improved Lightweight Safety Helmet Detection Algorithm for YOLOv8,” Computers, Materials & Continua, vol. 83, no. 2, pp. 2245–2265, 2025, doi: 10.32604/cmc.2025.061519.

CRediT Author Statement

The authors confirm contribution to the paper as follows:

Conceptualization: Arshad M, Kumar P; Methodology: Arshad M, Kumar P; Software: Arshad M; Data Curation: Arshad M; Writing- Original Draft Preparation: Arshad M, Kumar P; Visualization: Arshad M; Investigation: Kumar P; Supervision: Kumar P; Validation: Arshad M, Kumar P; Writing- Reviewing and Editing: All authors reviewed the results and approved the final version of the manuscript.

Acknowledgements

Author(s) thanks to Dr. Pradeep Kumar for this research completion and support.

Funding

No funding was received to assist with the preparation of this manuscript.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Availability of data and materials

Data sharing is not applicable to this article as no new data were created or analysed in this study.

Author information

Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.

Corresponding author

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/

Cite this article

Mohd Arshad and Pradeep Kumar, “Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection”, Journal of Machine and Computing, vol.5, no.3, pp. 1439-1449, July 2025, doi: 10.53759/7669/jmc202505114.

Copyright

© 2025 Mohd Arshad and Pradeep Kumar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Journal

Frequency: Quarterly

ISSN (Online) : 2788-7669

ISSN (Print) : 2789-1801

Journal of Machine and Computing

Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection

Journal of Machine and Computing

Received On : 05 June 2024

Revised On : 16 February 2025

Accepted On : 08 May 2025

Published On : 05 July 2025

Volume 05, Issue 03

Pages : 1439-1449

DOI

https://doi.org/10.53759/7669/jmc202505114

Article Views

Abstract

Keywords

CRediT Author Statement

Acknowledgements

Funding

Ethics declarations

Conflict of interest

Availability of data and materials

Author information

Contributions

Corresponding author

Rights and permissions

Cite this article

Copyright

Journals

Policies & Ethics

Resources

Information

Journal

Frequency: Quarterly ISSN (Online) : 2788-7669 ISSN (Print) : 2789-1801

Journal Overview

Articles

For Authors

For Editors and Reviewers

Special Issues

Journal Marketing

Contact Us

Journal of Machine and Computing

Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection

Journal of Machine and Computing

Received On : 05 June 2024

Revised On : 16 February 2025

Accepted On : 08 May 2025

Published On : 05 July 2025

Volume 05, Issue 03

Pages : 1439-1449

DOI

https://doi.org/10.53759/7669/jmc202505114

Article Views

Abstract

Keywords

CRediT Author Statement

Acknowledgements

Funding

Ethics declarations

Conflict of interest

Availability of data and materials

Author information

Contributions

Corresponding author

Rights and permissions

Cite this article

Copyright

Journals

Policies & Ethics

Resources

Information

Frequency: Quarterly

ISSN (Online) : 2788-7669

ISSN (Print) : 2789-1801