Acquiring real-time, accurate, large datasets is crucial and time-consuming for specific problems. Numerous datasets are available with annotations, but most are not feasible for a special task because of differences in the class label, class imbalance, and variability. One such solution to this problem is to use artificially crafted datasets (or synthetic datasets), which are scalable and can be automatically annotated. We utilized two different approaches—stable diffusion and cut-paste-blend—to generate a synthetic dataset. This study investigates the use of synthetic image datasets to observe the performance of YOLOv8 and improved YOLOv8 models for helmet detection. We trained models on both real-world and synthetic datasets and evaluated their performance in terms of detection accuracy. After training 50 epochs, the model achieved a mAP@50 of 78.6% on real data, 45.5% on synthetic data, and 75.4% on hybrid datasets. We analyzed how the hybrid dataset affected results using different ratios and discovered that with a 3:1 mix of hybrid data, the YOLOv8-based model reached an mAP@50 of 90.3%, which is better than when real and synthetic data were used in equal amounts. We proposed the Convolutional Block Attention Module-based YOLOv8-CBAM to enhance the accuracy of helmet and non-helmet detection. Experimental results indicate that YOLOv8-CBAM achieved an mAP@50 of 91% at 50, which is 0.7% better than the baseline model. This study also indicates that the correct proportion of synthetic datasets solved the class imbalance problem and improved the helmet detection accuracy in challenging environments.
Keywords
Synthetic Data, YOLO, Attention Mechanism, Deep Learning, Helmet Detection.
M. Giuffrè and D. L. Shung, “Harnessing the power of synthetic data in healthcare: innovation, application, and privacy,” npj Digital Medicine, vol. 6, no. 1, Oct. 2023, doi: 10.1038/s41746-023-00927-3.
N. Giakoumoglou, E. M. Pechlivani, and D. Tzovaras, “Generate-Paste-Blend-Detect: Synthetic dataset for object detection in the agriculture domain,” Smart Agricultural Technology, vol. 5, p. 100258, Oct. 2023, doi: 10.1016/j.atech.2023.100258.
G. Delgado, A. Cortés, S. García, E. Loyo, M. Berasategi, and N. Aranjuelo, “Methodology for generating synthetic labeled datasets for visual container inspection,” Transportation Research Part E: Logistics and Transportation Review, vol. 175, p. 103174, Jul. 2023, doi: 10.1016/j.tre.2023.103174.
M. Goyal and Q. H. Mahmoud, “A Systematic Review of Synthetic Data Generation Techniques Using Generative AI,” Electronics, vol. 13, no. 17, p. 3509, Sep. 2024, doi: 10.3390/electronics13173509.
A. Kniazev., P. Slivnitsin, L. Mylnikov, S. Schlechtweg, “Perm National Research Polytechnic University, & Anhalt University of Applied Sciences. (2021)”. Influence of synthetic image datasets on the result of neural networks for object detection. Proc. Of the 9th International Conference on Applied Innovations in IT, 55.
M. G. Ljungqvist, O. Nordander, M. Skans, A. Mildner, T. Liu, and P. Nugues, “Object Detector Differences when Using Synthetic and Real Training Data,” SN Computer Science, vol. 4, no. 3, Mar. 2023, doi: 10.1007/s42979-023-01704-5.
J. Kim, I. Wang, and J. Yu, “Experimental Study on Using Synthetic Images as a Portion of Training Dataset for Object Recognition in Construction Site,” Buildings, vol. 14, no. 5, p. 1454, May 2024, doi: 10.3390/buildings14051454.
Y. Wang, W. Deng, Z. Liu, and J. Wang, “Deep learning‐based vehicle detection with synthetic image data,” IET Intelligent Transport Systems, vol. 13, no. 7, pp. 1097–1105, Mar. 2019, doi: 10.1049/iet-its.2018.5365.
V. Shakhuro, B. Faizov, and A. Konushin, “Rare Traffic Sign Recognition using Synthetic Training Data,” Proceedings of the 3rd International Conference on Video and Image Processing, pp. 23–26, Dec. 2019, doi: 10.1145/3376067.3376105.
Q. Zhou, J. Qin, X. Xiang, Y. Tan, and N. N. Xiong, “Algorithm of Helmet Wearing Detection Based on AT-YOLO Deep Mode,” Computers, Materials & Continua, vol. 69, no. 1, pp. 159–174, 2021, doi: 10.32604/cmc.2021.017480.
C. Shan, H. Liu, and Y. Yu, “Research on improved algorithm for helmet detection based on YOLOv5,” Scientific Reports, vol. 13, no. 1, Oct. 2023, doi: 10.1038/s41598-023-45383-x.
Z. P. Xu, Y. Zhang, J. Cheng, and G. Ge, “Safety Helmet Wearing Detection Based on YOLOv5 of Attention Mechanism,” Journal of Physics: Conference Series, vol. 2213, no. 1, p. 012038, Mar. 2022, doi: 10.1088/1742-6596/2213/1/012038.
B. Lin, “Safety Helmet Detection Based on Improved YOLOv8,” IEEE Access, vol. 12, pp. 28260–28272, 2024, doi: 10.1109/access.2024.3368161.
K. Patil, R. Jadhav, Y. Suryawanshi, P. Chumchu, G. Khare, and T. Shinde, “HelmetML: A dataset of helmet images for machine learning applications,” Data in Brief, vol. 56, p. 110790, Oct. 2024, doi: 10.1016/j.dib.2024.110790.
T. Thamaraimanalan, P. G. Vishnu, R. Dineshkumar, A. Dayanand, S. M. Shahil, and N. Ashokkumar, “Prevention of Road Accidents Using Hybrid Machine Learning Algorithm,” 2024 10th International Conference on Advanced Computing and Communication Systems (ICACCS), pp. 2137–2143, Mar. 2024, doi: 10.1109/icaccs60874.2024.10717245.
M.-E. Otgonbold et al., “SHEL5K: An Extended Dataset and Benchmarking for Safety Helmet Detection,” Sensors, vol. 22, no. 6, p. 2315, Mar. 2022, doi: 10.3390/s22062315.
Xie, L. (2019). “Hardhat [Dataset]. In Harvard Dataverse”. https://doi.org/10.7910/dvn/7cbgos.
I. Reutov, “Generating of synthetic datasets using diffusion models for solving computer vision tasks in urban applications,” Procedia Computer Science, vol. 229, pp. 335–344, 2023, doi: 10.1016/j.procs.2023.12.036.
V. C. Pezoulas et al., “Synthetic data generation methods in healthcare: A review on open-source tools and methods,” Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, Dec. 2024, doi: 10.1016/j.csbj.2024.07.005.
R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On The Detection of Synthetic Images Generated by Diffusion Models,” ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, Jun. 2023, doi: 10.1109/icassp49357.2023.10095167.
C.-T. Chien, R.-Y. Ju, K.-Y. Chou, E. Xieerke, and J.-S. Chiang, “YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,” IEEE Access, vol. 13, pp. 52461–52477, 2025, doi: 10.1109/access.2025.3549839.
S. Woo, J. Park, J. Lee, & Kweon, I. S. (2018). CBAM:” Convolutional Block Attention Module. In Lecture notes in computer science (pp. 3–19)”. https://doi.org/10.1007/978-3-030-01234-2_1.
L. Zhang, H. Ma, J. Huang, C. Zhang, and X. Gao, “An Improved Lightweight Safety Helmet Detection Algorithm for YOLOv8,” Computers, Materials & Continua, vol. 83, no. 2, pp. 2245–2265, 2025, doi: 10.32604/cmc.2025.061519.
CRediT Author Statement
The authors confirm contribution to the paper as follows:
Conceptualization: Arshad M, Kumar P;
Methodology: Arshad M, Kumar P;
Software: Arshad M;
Data Curation: Arshad M;
Writing- Original Draft Preparation: Arshad M, Kumar P;
Visualization: Arshad M;
Investigation: Kumar P;
Supervision: Kumar P;
Validation: Arshad M, Kumar P;
Writing- Reviewing and Editing: All authors reviewed the results and approved the final version of the manuscript.
Acknowledgements
Author(s) thanks to Dr. Pradeep Kumar for this research completion and support.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Mohd Arshad
Department of Computer Science and Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Mohd Arshad and Pradeep Kumar, “Impact of Synthetic Data on Training and Improved YOLOv8 Models for Helmet Detection”, Journal of Machine and Computing, vol.5, no.3, pp. 1439-1449, July 2025, doi: 10.53759/7669/jmc202505114.