Journal of Machine and Computing


Enhancing Spoof Detection in Automatic Speaker Verification Using CQCC Optimization and ViT Architecture



Journal of Machine and Computing

Received On : 23 February 2025

Revised On : 02 June 2025

Accepted On : 08 August 2025

Published On : 05 October 2025

Volume 05, Issue 04

Pages : 2625-2642


Abstract


Spoof detection is found to be essential for improving the security features of automatic speaker verification (ASV) systems, which are primarily used in authentication. The primary goal of this study is to enhance the performance and efficiency of spoof detection using speech samples taken from the ASVspoof 2019 dataset. The Constant Q Cepstral Coefficients (CQCC) extracted from these speech samples act as an important key feature. Feature optimization methods such as Genetic Algorithm (GA), Grey Wolf Optimizer (GWO), and Mayfly Optimizer (MO) are used to refine these features and hence enhance the model accuracy with minimal time cost. A Vision Transformer (ViT) model is then trained using each optimized feature, and the performance is evaluated by comparing the results from different optimization methods. Time analysis shows a substantial reduction in training time per epoch when the optimized features are used. The Genetic Algorithm attained the best performance, with a test accuracy of 97% and the least training time. Equal Error Rate (EER) and the Tandem Detection Cost Function (t-DCF) are used as the evaluation metrics. This study demonstrates how feature optimization helps to enhance spoof detection accuracy while reducing processing time, hence becoming an authentic solution for real-time ASV systems.


Keywords


Feature Optimization, Vision Transformer, CQCC.


  1. I.-Y. Kwak et al., “Voice Spoofing Detection Through Residual Network, Max Feature Map, and Depthwise Separable Convolution,” IEEE Access, vol. 11, pp. 49140–49152, 2023, doi: 10.1109/access.2023.3275790.
  2. M. Neelima and I. S. Prabha, “Hybrid Feature Optimization for Voice Spoof Detection Using CNN-LSTM,” Traitement du Signal, vol. 41, no. 2, pp. 717–727, Apr. 2024, doi: 10.18280/ts.410214.
  3. Yi, J., Wang, C., Tao, J., Zhang, X., Zhang, C. Y., & Zhao, Y. (2023). Audio deepfake detection: A survey. arXiv preprint arXiv:2308.14970.
  4. O. A. Shaaban, R. Yildirim, and A. A. Alguttar, “Audio Deepfake Approaches,” IEEE Access, vol. 11, pp. 132652–132682, 2023, doi: 10.1109/access.2023.3333866.
  5. A. Chaudhari and D. K. Shedge, “Integration of CQCC and MFCC based Features for Replay Attack Detection,” 2022 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 1–5, Mar. 2022, doi: 10.1109/esci53509.2022.9758391.
  6. J. Yang and X. Zhang, “A Modified Parameter Optimization Method of Support Vector Machine Based on Genetic Algorithm,” 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE), pp. 878–881, Mar. 2024, doi: 10.1109/icaace61206.2024.10548761.
  7. S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar. 2014, doi: 10.1016/j.advengsoft.2013.12.007.
  8. K. Zervoudakis and S. Tsafarakis, “A mayfly optimization algorithm,” Computers & Industrial Engineering, vol. 145, p. 106559, Jul. 2020, doi: 10.1016/j.cie.2020.106559.
  9. K. Han et al., “A Survey on Vision Transformer,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87–110, Jan. 2023, doi: 10.1109/tpami.2022.3152247.
  10. M. Mcuba, A. Singh, R. A. Ikuesan, and H. Venter, “The Effect of Deep Learning Methods on Deepfake Audio Detection for Digital Investigation,” Procedia Computer Science, vol. 219, pp. 211–219, 2023, doi: 10.1016/j.procs.2023.01.283.
  11. H. Faris, I. Aljarah, M. A. Al-Betar, and S. Mirjalili, “Grey wolf optimizer: a review of recent variants and applications,” Neural Computing and Applications, vol. 30, no. 2, pp. 413–435, Nov. 2017, doi: 10.1007/s00521-017-3272-5.
  12. Z. Chen, J. Li, J. Li, X. Zhu, and C. Li, “GNSS Multiparameter Spoofing Detection Method Based on Support Vector Machine,” IEEE Sensors Journal, vol. 22, no. 18, pp. 17864–17874, Sep. 2022, doi: 10.1109/jsen.2022.3193388.
  13. R. Anagha, A. Arya, V. H. Narayan, S. Abhishek, and T. Anjali, “Audio Deepfake Detection Using Deep Learning,” 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 176–181, Dec. 2023, doi: 10.1109/smart59791.2023.10428163.
  14. M. Todisco, H. Delgado, and N. Evans, “Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification,” Computer Speech & Language, vol. 45, pp. 516–535, Sep. 2017, doi: 10.1016/j.csl.2017.01.001.
  15. Y. Ye, L. Lao, D. Yan, and L. Lin, “Detection of Replay Attack Based on Normalized Constant Q Cepstral Feature,” 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 407–411, Apr. 2019, doi: 10.1109/icccbda.2019.8725688.
  16. J. Zhan, Z. Pu, W. Jiang, J. Wu, and Y. Yang, “Detecting Spoofed Speeches via Segment-Based Word CQCC and Average ZCR for Embedded Systems,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 3862–3873, Nov. 2022, doi: 10.1109/tcad.2022.3197531.
  17. Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Jiang, Z., ... & Feng, J. (2021). Deepvit: Towards deeper vision transformer. arXiv preprint arXiv:2103.11886.
  18. A. Lambora, K. Gupta, and K. Chopra, “Genetic Algorithm- A Literature Review,” 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp. 380–384, Feb. 2019, doi: 10.1109/comitcon.2019.8862255.
  19. Z.-M. Gao, J. Zhao, S.-R. Li, and Y.-R. Hu, “The improved mayfly optimization algorithm,” Journal of Physics: Conference Series, vol. 1684, no. 1, p. 012077, Nov. 2020, doi: 10.1088/1742-6596/1684/1/012077.
  20. D. Zhou, Z. Kang, X. Su, and C. Yang, “An enhanced Mayfly optimization algorithm based on orthogonal learning and chaotic exploitation strategy,” International Journal of Machine Learning and Cybernetics, vol. 13, no. 11, pp. 3625–3643, Aug. 2022, doi: 10.1007/s13042-022-01617-4.
  21. X. Wang et al., “ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech,” Computer Speech & Language, vol. 64, p. 101114, Nov. 2020, doi: 10.1016/j.csl.2020.101114.
  22. Schörkhuber, C., Klapuri, A., & Sontacchi, A. (2013). Audio pitch shifting using the constant-Q transform. Journal of the Audio Engineering Society, 61(7/8), 562-572.
  23. N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine Transform,” IEEE Transactions on Computers, vol. C–23, no. 1, pp. 90–93, Jan. 1974, doi: 10.1109/t-c.1974.223784.
  24. Babatunde, O. H., Armstrong, L., Leng, J., & Diepeveen, D. (2014). A genetic algorithm-based feature selection.
  25. I. Shahin, O. A. Alomari, A. B. Nassif, I. Afyouni, I. A. Hashem, and A. Elnagar, “An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer,” Applied Acoustics, vol. 205, p. 109279, Mar. 2023, doi: 10.1016/j.apacoust.2023.109279.
  26. S. K. Ladi, G. K. Panda, R. Dash, P. K. Ladi, and R. Dhupar, “A Novel Grey Wolf Optimisation based CNN Classifier for Hyperspectral Image classification,” Multimedia Tools and Applications, vol. 81, no. 20, pp. 28207–28230, Mar. 2022, doi: 10.1007/s11042-022-12628-2.
  27. M. N. Bogar, I. D. Shirodkar, O. Kulkarni, S. Jawade, and G. Kakandikar, “Mayfly optimization algorithm: a review,” Journal of Mechatronics and Artificial Intelligence in Engineering, vol. 5, no. 1, pp. 17–30, Jun. 2024, doi: 10.21595/jmai.2024.23909.
  28. T. Bhattacharyya, B. Chatterjee, P. K. Singh, J. H. Yoon, Z. W. Geem, and R. Sarkar, “Mayfly in Harmony: A New Hybrid Meta-Heuristic Feature Selection Algorithm,” IEEE Access, vol. 8, pp. 195929–195945, 2020, doi: 10.1109/access.2020.3031718.
  29. I. K. Gupta, A. Choubey, and S. Choubey, “Mayfly optimization with deep learning enabled retinal fundus image classification model,” Computers and Electrical Engineering, vol. 102, p. 108176, Sep. 2022, doi: 10.1016/j.compeleceng.2022.108176.
  30. I. Rojas, J. Gonzalez, H. Pomares, J. J. Merelo, P. A. Castillo, and G. Romero, “Statistical analysis of the main parameters involved in the design of a genetic algorithm,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 32, no. 1, pp. 31–37, Feb. 2002, doi: 10.1109/tsmcc.2002.1009128.
  31. J. Inagaki, M. Haseyama, and H. Kitajima, “A genetic algorithm for determining multiple routes and its applications,” ISCAS’99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349), vol. 6, pp. 137–140, doi: 10.1109/iscas.1999.780114.
  32. A. A. Javadi, R. Farmani, and T. P. Tan, “A hybrid intelligent genetic algorithm,” Advanced Engineering Informatics, vol. 19, no. 4, pp. 255–262, Oct. 2005, doi: 10.1016/j.aei.2005.07.003.
  33. Q. Li, H. Zhu, Z. Zhang, R. Lu, F. Wang, and H. Li, “Spoofing Attacks on Speaker Verification Systems Based Generated Voice using Genetic Algorithm,” ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pp. 1–6, May 2019, doi: 10.1109/icc.2019.8761244.
  34. J. Jenkins, K. Roy, and J. Shelton, “Using deep learning techniques and genetic-based feature extraction for presentation attack mitigation,” Array, vol. 7, p. 100029, Sep. 2020, doi: 10.1016/j.array.2020.100029.
  35. K. A. Nixon, V. Aimale, and R. K. Rowe, “Spoof Detection Schemes,” Handbook of Biometrics, pp. 403–423, doi: 10.1007/978-0-387-71041-9_20.

CRediT Author Statement


The authors confirm contribution to the paper as follows:

Conceptualization: Selin M and Preetha Mathew K; Writing- Original Draft Preparation: Selin M and Preetha Mathew K; Visualization: Selin M; Investigation: Preetha Mathew K; Supervision: Selin M; Validation: Preetha Mathew K; Writing- Reviewing and Editing: Selin M and Preetha Mathew K; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


Authors thanks to Department of Computer Applications for this research support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Selin M and Preetha Mathew K, “Enhancing Spoof Detection in Automatic Speaker Verification Using CQCC Optimization and ViT Architecture”, Journal of Machine and Computing, vol.5, no.4, pp. 2625-2642, October 2025, doi: 10.53759/7669/jmc202505202.


Copyright


© 2025 Selin M and Preetha Mathew K. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.