Indian Sign Language (ISL) identification methods play a central role in enhancing communication between hearing-impaired and non-impaired individuals within their community. However, modern ISL identification algorithms face challenges due to hand gesture variability, complex visual settings, and limited official annotations. This study proposes a Hybrid Vision Transformer with Convolutions (HVTC) combined with Ensemble Transfer Learning (ETL), incorporating advanced transfer learning methods such as Adaptive Lightweight DenseNet, VGG19, and XceptionNet for Multi-Task Learning, along with ResNet with Dynamic Depth and MobileNetV3 with Attention Mechanisms to improve ISL recognition accuracy. Four primary challenges affect ISL recognition: obstructions in the camera view, inconsistent lighting conditions, visually similar motions that are difficult to distinguish, and the need for extensive labeled datasets for deep learning systems. The ETL-HVTC processing method effectively extracts spatial-temporal motion data by leveraging sophisticated neural network algorithms. Transfer learning reduces dependency on large datasets, while the ensemble approach integrates multiple predictive models to enhance model stability. A robust ISL recognition algorithm should prioritize real-time capabilities, high recognition accuracy, and an expanded application scope. Secure gesture dataset pre-processing enables the optimization of hybrid ViT Large Model-CNN models, where collaborative learning ensures reliable classification outcomes. Experimental results demonstrate that the proposed ETL-HVTC system outperforms independent ViT Large Model and existing CNN models on ISL benchmark databases in terms of precision, recall, F1-score, and accuracy. The implementation approach yields fast and effective results, facilitating the development of assistive devices that promote more inclusive communication for individuals with hearing impairments.
Keywords
Indian Sign Language Recognition, Vision Transformers, Convolutional Neural Networks, Transfer Learning, Ensemble Learning, Deep Learning, Hybrid Models, Gesture Recognition, Assistive Communication, Multimodal Feature Extraction.
K. Priya and B. J. Sandesh, “Developing an Offline and Real-Time Indian Sign Language Recognition System with Machine Learning and Deep Learning,” SN Computer Science, vol. 5, no. 3, Feb. 2024, doi: 10.1007/s42979-023-02482-w.
S. Das, S. Kr. Biswas, and B. Purkayastha, “An Expert System for Indian Sign Language Recognition Using Spatial Attention–based Feature and Temporal Feature,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 3, pp. 1–23, Mar. 2024, doi: 10.1145/3643824.
K. D. Kumar, K. Ragul, G. P. Pravin Kumar, and G. Kajeeth Kumar, “Enhancing Sign Language Recognition through Deep CNN and Handcrafted Features,” 2024 2nd International Conference on Networking and Communications (ICNWC), pp. 1–6, Apr. 2024, doi: 10.1109/icnwc60771.2024.10537400.
S. Renjith and R. Manazhy, “Sign language : a systematic review on classification and recognition,” Multimedia Tools and Applications, vol. 83, no. 31, pp. 77077–77127, Feb. 2024, doi: 10.1007/s11042-024-18583-4.
R. S, M. Rashmi, and S. S. M. S, “Sign Language Recognition by using Spatio-Temporal Features,” Procedia Computer Science, vol. 233, pp. 353–362, 2024, doi: 10.1016/j.procs.2024.03.225.
A. S. M. Miah, Md. A. M. Hasan, S. Nishimura, and J. Shin, “Sign Language Recognition Using Graph and General Deep Neural Network Based on Large Scale Dataset,” IEEE Access, vol. 12, pp. 34553–34569, 2024, doi: 10.1109/access.2024.3372425.
A. Singh, F. E. Hashmi, N. Tyagi, and A. K. Jayswal, “Impact of Colour Image and Skeleton Plotting on Sign Language Recognition Using Convolutional Neural Networks (CNN),” 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 436–441, Jan. 2024, doi: 10.1109/confluence60223.2024.10463239.
H. Alsolai, L. Alsolai, F. N. Al-Wesabi, M. Othman, M. Rizwanullah, and A. A. Abdelmageed, “Automated sign language detection and classification using reptile search algorithm with hybrid deep learning,” Heliyon, vol. 10, no. 1, p. e23252, Jan. 2024, doi: 10.1016/j.heliyon.2023.e23252.
A. S. M. Miah, Md. A. M. Hasan, Y. Okuyama, Y. Tomioka, and J. Shin, “Spatial–temporal attention with graph and general neural network-based sign language recognition,” Pattern Analysis and Applications, vol. 27, no. 2, Apr. 2024, doi: 10.1007/s10044-024-01229-4.
X. Xu and J. Fu, “A two-stage sign language recognition method focusing on the semantic features of label text,” 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 1–5, Feb. 2024, doi: 10.1109/aisp61396.2024.10475205.
S. Arooj, S. Altaf, S. Ahmad, H. Mahmoud, and A. S. N. Mohamed, “Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 2, p. 101934, Feb. 2024, doi: 10.1016/j.jksuci.2024.101934.
Shen, X., Yuan, S., Sheng, H., Du, H., & Yu, X. (2024). Auslan-daily: Australian sign language translation for daily communication and news. Advances in Neural Information Processing Systems, 36.
D. Kumari and R. S. Anand, “Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism,” Electronics, vol. 13, no. 7, p. 1229, Mar. 2024, doi: 10.3390/electronics13071229.
M. K. Fallah, M. Najafi, S. Gorgin, and J.-A. Lee, “An ultra-low-computation model for understanding sign languages,” Expert Systems with Applications, vol. 249, p. 123782, Sep. 2024, doi: 10.1016/j.eswa.2024.123782.
Desai, A., Berger, L., Minakov, F., Milano, N., Singh, C., Pumphrey, K., ... & Bragg, D. (2024). ASL citizen: a community-sourced dataset for advancing isolated sign language recognition. Advances in Neural Information Processing Systems, 36.
S. Kumer Paul et al., “An Adam based CNN and LSTM approach for sign language recognition in real time for deaf people,” Bulletin of Electrical Engineering and Informatics, vol. 13, no. 1, pp. 499–509, Feb. 2024, doi: 10.11591/eei.v13i1.6059.
P. K. Varshney, S. K. Kumar, and B. Thakur, “Real-Time Sign Language Recognition,” Medical Robotics and AI-Assisted Diagnostics for a High-Tech Healthcare Industry, pp. 81–92, May 2024, doi: 10.4018/979-8-3693-2105-8.ch006.
N. N. Kyaw, P. Mitra, and G. R. Sinha, “Automated recognition of Myanmar sign language using deep learning module,” International Journal of Information Technology, vol. 16, no. 2, pp. 633–640, Jan. 2024, doi: 10.1007/s41870-023-01680-2.
Rangdale, S., Sarkarkar, P., Kadam, S., Tegyalwar, H., Waghmare, C., & Shinde, S. (2024). CNN based Model for Hand Gesture Recognition and Detection Developed for Specially Disabled People. Grenze International Journal of Engineering & Technology (GIJET), 10.
V. R. Lahari et al., “Sign Language Classification Using Deep Learning Convolution Neural Networks Algorithm,” Journal of The Institution of Engineers (India): Series B, vol. 105, no. 5, pp. 1347–1355, Mar. 2024, doi: 10.1007/s40031-024-01035-w.
M. A. Rahaman, K. U. Oyshe, P. K. Chowdhury, T. Debnath, A. Rahman, and Md. S. I. Khan, “Computer vision-based six layered ConvNeural network to recognize sign language for both numeral and alphabet signs,” Biomimetic Intelligence and Robotics, vol. 4, no. 1, p. 100141, Mar. 2024, doi: 10.1016/j.birob.2023.100141.
S. Alyami, H. Luqman, and M. Hammoudeh, “Isolated Arabic Sign Language Recognition Using a Transformer-based Model and Landmark Keypoints,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 1, pp. 1–19, Jan. 2024, doi: 10.1145/3584984.
A. Oguntimilehin and K. Balogun, “Real-Time Sign Language Fingerspelling Recognition using Convolutional Neural Network,” The International Arab Journal of Information Technology, vol. 21, no. 1, 2024, doi: 10.34028//iajit/21/1/14.
A. M. J. AL Moustafa et al., “ARABIC SIGN LANGUAGE RECOGNITION SYSTEMS: A SYSTEMATIC REVIEW,” Indian Journal of Computer Science and Engineering, vol. 15, no. 1, pp. 1–18, Apr. 2024, doi: 10.21817/indjcse/2023/v15i1/241501008.
S. Pawar, Y. Shastri, and S. Z. Aiman, “Bidirectional Sign Language Assistant with MediaPipe Integration,” 2024 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 1–8, Mar. 2024, doi: 10.1109/esci59607.2024.10497336.
W. Jia and C. Li, “SLR-YOLO: An improved YOLOv8 network for real-time sign language recognition,” Journal of Intelligent & Fuzzy Systems, vol. 46, no. 1, pp. 1663–1680, Jan. 2024, doi: 10.3233/jifs-235132.
A. Mohan, D. Mohan, S. Vats, V. Sharma, and V. Kukreja, “Classification of Sign Language Gestures using CNN with Adam Optimizer,” 2024 2nd International Conference on Disruptive Technologies (ICDT), pp. 430–433, Mar. 2024, doi: 10.1109/icdt61202.2024.10489158.
M. Alaftekin, I. Pacal, and K. Cicek, “Real-time sign language recognition based on YOLO algorithm,” Neural Computing and Applications, vol. 36, no. 14, pp. 7609–7624, Feb. 2024, doi: 10.1007/s00521-024-09503-6.
K. M. H. Hama Rawf, A. O. Abdulrahman, and A. A. Mohammed, “Improved Recognition of Kurdish Sign Language Using Modified CNN,” Computers, vol. 13, no. 2, p. 37, Jan. 2024, doi: 10.3390/computers13020037.
J. Sunuwar, S. Borah, and A. Kharga, “NSL23 dataset for alphabets of Nepali sign language,” Data in Brief, vol. 53, p. 110080, Apr. 2024, doi: 10.1016/j.dib.2024.110080.
J. Shin, A. S. M. Miah, Y. Akiba, K. Hirooka, N. Hassan, and Y. S. Hwang, “Korean Sign Language Alphabet Recognition Through the Integration of Handcrafted and Deep Learning-Based Two-Stream Feature Extraction Approach,” IEEE Access, vol. 12, pp. 68303–68318, 2024, doi: 10.1109/access.2024.3399839.
Reeshav, V. Das, Veena, V. Meti, and Manjunath, “Sign language recognition using convolutional neural network,” 1ST International Conference On Emma-2021, vol. 2742, p. 020077, 2024, doi: 10.1063/5.0200495.
CRediT Author Statement
The authors confirm contribution to the paper as follows:
Conceptualization: Suresh Anand M, Mong-Fong Horng and Chin-Shiuh Shieh;
Methodology: Suresh Anand M and Mong-Fong Horng;
Data Curation: Mong-Fong Horng and Chin-Shiuh Shieh;
Writing- Original Draft Preparation: Suresh Anand M, Mong-Fong Horng and Chin-Shiuh Shieh;
Visualization: Mong-Fong Horng and Chin-Shiuh Shieh;
Investigation: Suresh Anand M and Mong-Fong Horng;
Supervision: Mong-Fong Horng and Chin-Shiuh Shieh;
Validation: Suresh Anand M and Mong-Fong Horng;
Writing- Reviewing and Editing: Suresh Anand M, Mong-Fong Horng and Chin-Shiuh Shieh; All authors reviewed the results and approved the final version of the manuscript.
Acknowledgements
The authors would like to thank to the reviewers for nice comments on the manuscript.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Suresh Anand M
Department of Computing Technologies, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Suresh Anand M, Mong-Fong Horng and Chin-Shiuh Shieh, “Develop an Ensemble Transfer Learning with Hybrid Vision Transformers with Convolutions for Enhancing Indian Sign Language Recognition”, Journal of Machine and Computing, vol.5, no.4, pp. 2386-2404, October 2025, doi: 10.53759/7669/jmc202505185.