Journal of Machine and Computing


AROA based Pre-trained Model of Convolutional Neural Network for Voice Pathology Detection and Classification



Journal of Machine and Computing

Received On : 31 August 2023

Revised On : 25 October 2023

Accepted On : 30 January 2024

Published On : 05 April 2024

Volume 04, Issue 02

Pages : 463-471


Abstract


With the demand for better, more user-friendly HMIs, voice recognition systems have risen in prominence in recent years. The use of computer-assisted vocal pathology categorization tools allows for the accurate detection of voice pathology diseases. By using these methods, vocal disorders may be diagnosed early on and treated accordingly. An effective Deep Learning-based tool for feature extraction-based vocal pathology identification is the goal of this project. This research presents the results of using EfficientNet, a pre-trained Convolutional Neural Network (CNN), on a speech pathology dataset in order to achieve the highest possible classification accuracy. An Artificial Rabbit Optimization Algorithm (AROA)-tuned set of parameters complements the model's mobNet building elements, which include a linear stack of divisible convolution and max-pooling layers activated by Swish. In order to make the suggested approach applicable to a broad variety of voice disorder problems, this study also suggests a unique training method along with several training methodologies. One speech database, the Saarbrücken voice database (SVD), has been used to test the proposed technology. Using up to 96% accuracy, the experimental findings demonstrate that the suggested CNN approach is capable of detecting speech pathologies. The suggested method demonstrates great potential for use in real-world clinical settings, where it may provide accurate classifications in as little as three seconds and expedite automated diagnosis and treatment.


Keywords


Artificial Rabbit Optimization Algorithm; Saarbrücken voice database; Convolutional Neural Network; Voice recognition systems; Separable convolution.


  1. L. Geng, Y. Liang, H. Shan, Z. Xiao, W. Wang, and M. Wei, “Pathological Voice Detection and Classification Based on Multimodal Transmission Network,” Journal of Voice, Dec. 2022, doi: 10.1016/j.jvoice.2022.11.018.
  2. N. Q. Abdulmajeed, B. Al-Khateeb, and M. A. Mohammed, “A review on voice pathology: Taxonomy, diagnosis, medical procedures and detection techniques, open challenges, limitations, and recommendations for future directions,” Journal of Intelligent Systems, vol. 31, no. 1, pp. 855–875, Jan. 2022, doi: 10.1515/jisys-2022-0058.
  3. L. Chen and J. Chen, “Deep Neural Network for Automatic Classification of Pathological Voice Signals,” Journal of Voice, vol. 36, no. 2, pp. 288.e15-288.e24, Mar. 2022, doi: 10.1016/j.jvoice.2020.05.029.
  4. R. Islam, E. Abdel-Raheem, and M. Tarique, “Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals,” Computer Methods and Programs in Biomedicine Update, vol. 2, p. 100074, 2022, doi: 10.1016/j.cmpbup.2022.100074.
  5. Thirumalraj, V. Asha, and B. P. Kavin, “An Improved Hunter-Prey Optimizer-Based DenseNet Model for Classification of Hyper-Spectral Images,” Advances in Medical Technologies and Clinical Practice, pp. 76–96, Oct. 2023, doi: 10.4018/979-8-3693-0876-9.ch005.
  6. Ksibi, N. A. Hakami, N. Alturki, M. M. Asiri, M. Zakariah, and M. Ayadi, “Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN–RNN Architecture,” Sustainability, vol. 15, no. 4, p. 3204, Feb. 2023, doi: 10.3390/su15043204.
  7. N. Omeroglu, H. M. A. Mohammed, and E. A. Oral, “Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion,” Engineering Science and Technology, an International Journal, vol. 36, p. 101148, Dec. 2022, doi: 10.1016/j.jestch.2022.101148.
  8. M. Zakariah, R. B, Y. Ajmi Alotaibi, Y. Guo, K. Tran-Trung, and M. M. Elahi, “An Analytical Study of Speech Pathology Detection Based on MFCC and Deep Neural Networks,” Computational and Mathematical Methods in Medicine, vol. 2022, pp. 1–15, Apr. 2022, doi: 10.1155/2022/7814952.
  9. Zhou, Y. Wu, Z. Fan, X. Zhang, D. Wu, and Z. Tao, “Gammatone spectral latitude features extraction for pathological voice detection and classification,” Applied Acoustics, vol. 185, p. 108417, Jan. 2022, doi: 10.1016/j.apacoust.2021.108417.
  10. S. Tirronen, S. R. Kadiri, and P. Alku, “The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection,” Journal of Voice, Apr. 2022, doi: 10.1016/j.jvoice.2022.03.021.
  11. F. Javanmardi, S. R. Kadiri, M. Kodali, and P. Alku, “Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers,” Interspeech 2022, Sep. 2022, doi: 10.21437/interspeech.2022-10420.
  12. S. Fujimura et al., “Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network,” Journal of Voice, vol. 36, no. 1, pp. 15–20, Jan. 2022, doi: 10.1016/j.jvoice.2020.02.009.
  13. F. Javanmardi, S. R. Kadiri, and P. Alku, “A comparison of data augmentation methods in voice pathology detection,” Computer Speech & Language, vol. 83, p. 101552, Jan. 2024, doi: 10.1016/j.csl.2023.101552.
  14. N. Q. Abdulmajeed, B. Al‐Khateeb, and M. A. Mohammed, “Voice pathology identification system using a deep learning approach based on unique feature selection sets,” Expert Systems, May 2023, doi: 10.1111/exsy.13327.
  15. Fu, X. Zhang, D. Chen, and W. Hu, “Pathological Voice Detection Based on Phase Reconstitution and Convolutional Neural Network,” Journal of Voice, Oct. 2022, doi: 10.1016/j.jvoice.2022.08.028.
  16. M. Ur Rehman, A. Shafique, Q.-U.-A. Azhar, S. S. Jamal, Y. Gheraibia, and A. B. Usman, “Voice disorder detection using machine learning algorithms: An application in speech and language pathology,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108047, Jul. 2024, doi: 10.1016/j.engappai.2024.108047.
  17. Zhao, Z. Qiu, Y. Jiang, X. Zhu, X. Zhang, and Z. Tao, “A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detection,” Biomedical Signal Processing and Control, vol. 88, p. 105624, Feb. 2024, doi: 10.1016/j.bspc.2023.105624.
  18. M. K. Yagnavajjula, K. R. Mittapalle, P. Alku, S. R. K., and P. Mitra, “Automatic classification of neurological voice disorders using wavelet scattering features,” Speech Communication, vol. 157, p. 103040, Feb. 2024, doi: 10.1016/j.specom.2024.103040.
  19. J. Mishra and R. K. Sharma, “Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders,” Journal of Circuits, Systems and Computers, Jan. 2024, doi: 10.1142/s0218126624501731.
  20. H. M. A. Mohammed, A. N. Omeroglu, and E. A. Oral, “MMHFNet: Multi-modal and multi-layer hybrid fusion network for voice pathology detection,” Expert Systems with Applications, vol. 223, p. 119790, Aug. 2023, doi: 10.1016/j.eswa.2023.119790.
  21. Saveleva et al., “Graph-based Argument Quality Assessment,” Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, 2021, doi: 10.26615/978-954-452-072-4_143.
  22. M. A. Thirumalraj, B. Rajalakshmi, B. S. Kumar, and S. Stephe, “Automated Fruit Identification using Modified AlexNet Feature Extraction based FSSATM Classifier,” Mar. 2024, doi: 10.21203/rs.3.rs-4074664/v1.
  23. Riad, A. J., Hasanien, H. M., Turky, R. A., & Yakout, A. H. (2023). Identifying the PEM fuel cell parameters using artificial rabbits optimization algorithm. Sustainability, 15(5), 4625.

Acknowledgements


Authors thank Reviewers for taking the time and effort necessary to review the manuscript.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Manikandan J, Kayalvizhi K, Yuvaraj Nachimuthu and Jeena R, “AROA based Pre-trained Model of Convolutional Neural Network for Voice Pathology Detection and Classification", pp. 463-471, April 2024. doi: 10.53759/7669/jmc202404044.


Copyright


© 2024 Manikandan J, Kayalvizhi K, Yuvaraj Nachimuthu and Jeena R. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.