Journal of Machine and Computing


Signals and Syntax: Deep Learning for Emotional Intelligence and Text-Based Linguistic Proficiency



Journal of Machine and Computing

Received On : 12 July 2025

Revised On : 27 August 2025

Accepted On : 19 September 2025

Published On : 05 October 2025

Volume 05, Issue 04

Pages : 2808-2815


Abstract


In the real-world communication scenario, understanding Emotional Intelligence (EI) of humans plays a pivotal role and this strengths to a greater extent when the modalities are combined i.e video & audio. This paper investigates the human emotions which are crucial in communication using Video, Audio and Video & Audio (V-A). A unique dataset of impulsive Emotional Intelligence videos was collected. Action Units (AU’s) were extracted from the videos and from audios – spectral descriptors, MFCCs and prosodic features. To train the model we propose a 1D-CNN (One-Dimensional-Convolutional Neural Network). Multimodal late-fusion signals were trained on 1D-CNN architecture which achieved significantly 83.33% accuracy. This demonstrates that by integrating multimodal cues predictions are stable and diminish weakness of unimodal. This study emphasis on efficient multimodal design of EI recognition frameworks for assessment of real-world communication. We also experimented on textual domain. From the audios we extracted the text and the text has been fine-tuned for multiclass classification with BERT-embeddings. We trained the features with Multilayer Perceptron (MLP) where the model has achieved 91% accuracy on the test set.


Keywords


Emotion Intelligence, Action Units, Deep Learning, MLP, BERT.


  1. Cloud, N., Genesee, F., & Hamayan, E. (2000). Teaching English language learners: A focus on content and language. Heinle & Heinle.
  2. M. CANALE and M. SWAIN, “THEORETICAL BASES OF COMMUNICATIVE APPROACHES TO SECOND LANGUAGE TEACHING AND TESTING,” Applied Linguistics, vol. I, no. 1, pp. 1–47, Jan. 1980, doi: 10.1093/applin/i.1.1.
  3. C.-H. Wu, J.-C. Lin, and W.-L. Wei, “Survey on audiovisual emotion recognition: databases, features, and data fusion strategies,” APSIPA Transactions on Signal and Information Processing, vol. 3, no. 1, 2014, doi: 10.1017/atsip.2014.11.
  4. S. Chanda, K. Fitwe, G. Deshpande, B. W. Schuller, and S. Patel, “A Deep Audiovisual Approach for Human Confidence Classification,” Frontiers in Computer Science, vol. 3, Oct. 2021, doi: 10.3389/fcomp.2021.674533.
  5. Bhavan, A., Chauhan, P., & Shah, R. R. (2019). Recognizing emotions from speech using MFCC and deep learning. 2019 IEEE Third International Conference on Multimedia Computing, Networking and Applications (MCNA), 7–12. https://doi.org/10.1109/MCNA.2019.8938161
  6. Y. B. Singh and S. Goel, “An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning,” Multimedia Tools and Applications, vol. 80, no. 9, pp. 14001–14018, Jan. 2021, doi: 10.1007/s11042-020-10399-2.
  7. Yousefi, F. (2006). The relationship between emotional intelligence and communication skills in university students. Journal of Iranian Psychologists, 3(9), 5-13. https://www.sid.ir/en/journal/ViewPaper.aspx?id=10312
  8. Taşlıyan, M., Hırlak, B., & Harbalıoğlu, M. (2017). The relationship between emotional intelligence, communicatıon skills and academic achievement: An application on university students. Assam Uluslararası Hakemli Dergi, 2(3), 45-58. https://dergipark.org.tr/tr/pub/assam/issue/32282/358459
  9. A. Metallinou, S. Lee, and S. Narayanan, “Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice,” 2008 Tenth IEEE International Symposium on Multimedia, pp. 250–257, Dec. 2008, doi: 10.1109/ism.2008.40.
  10. B. T. Nguyen, M. H. Trinh, T. V. Phan, and H. D. Nguyen, “An efficient real-time emotion detection using camera and facial landmarks,” 2017 Seventh International Conference on Information Science and Technology (ICIST), pp. 251–255, Apr. 2017, doi: 10.1109/icist.2017.7926765.
  11. K. S. Rao and S. G. Koolagudi, “Recognition of emotions from video using acoustic and facial features,” Signal, Image and Video Processing, vol. 9, no. 5, pp. 1029–1045, Jul. 2013, doi: 10.1007/s11760-013-0522-6.
  12. Tripathi, S., Tripathi, S., and Beigi, H. (2018). Multi-Modal Emotion Recognition on Iemocap Dataset Using Deep Learning. arXiv
  13. D. Kamińska and A. Pelikant, “Recognition of Human Emotion from a Speech Signal Based on Plutchik’s Model,” International Journal of Electronics and Telecommunications, vol. 58, no. 2, pp. 165–170, Jun. 2012, doi: 10.2478/v10177-012-0024-4.
  14. F. Noroozi, T. Sapiński, D. Kamińska, and G. Anbarjafari, “Vocal-based emotion recognition using random forests and decision tree,” International Journal of Speech Technology, vol. 20, no. 2, pp. 239–246, Feb. 2017, doi: 10.1007/s10772-017-9396-2.
  15. E. Bagheri, A. Bagheri, P. G. Esteban, and B. Vanderborgth, “A Novel Model for Emotion Detection from Facial Muscles Activity,” Robot 2019: Fourth Iberian Robotics Conference, pp. 237–249, Nov. 2019, doi: 10.1007/978-3-030-36150-1_20.
  16. P. Khorrami, T. L. Paine, and T. S. Huang, “Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?,” 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 19–27, Dec. 2015, doi: 10.1109/iccvw.2015.12.
  17. I. Mueller et al., “If it Ain’t Broke, Don’t Fix it: Positive Versus Negative Emotion Regulation in Daily Life and Depressive Symptoms,” Journal of Affective Disorders, vol. 348, pp. 398–408, Mar. 2024, doi: 10.1016/j.jad.2023.12.037.
  18. K. A. Lindquist, J. K. MacCormack, and H. Shablack, “The role of language in emotion: predictions from psychological constructionism,” Frontiers in Psychology, vol. 6, Apr. 2015, doi: 10.3389/fpsyg.2015.00444.
  19. C. MacCann, Y. Jiang, L. E. R. Brown, K. S. Double, M. Bucich, and A. Minbashian, “Emotional intelligence predicts academic performance: A meta-analysis.,” Psychological Bulletin, vol. 146, no. 2, pp. 150–186, Feb. 2020, doi: 10.1037/bul0000219.
  20. D. Watson, L. A. Clark, and A. Tellegen, “Development and validation of brief measures of positive and negative affect: The PANAS scales.,” Journal of Personality and Social Psychology, vol. 54, no. 6, pp. 1063–1070, 1988, doi: 10.1037/0022-3514.54.6.1063.
  21. E. Bagheri, A. Bagheri, P. G. Esteban, and B. Vanderborgth, “A Novel Model for Emotion Detection from Facial Muscles Activity,” Robot 2019: Fourth Iberian Robotics Conference, pp. 237–249, Nov. 2019, doi: 10.1007/978-3-030-36150-1_20.
  22. P. Ekman and W. V. Friesen, “Facial Action Coding System,” PsycTESTS Dataset. American Psychological Association (APA), 1978. doi: 10.1037/t27734-000.

CRediT Author Statement


The author reviewed the results and approved the final version of the manuscript.

Conceptualization: A N Jyothsna and Pamela Vinitha Eric; Methodology: Pamela Vinitha Eric; Data Curation: Pamela Vinitha Eric; Writing- Original Draft Preparation: A N Jyothsna and Pamela Vinitha Eric; Investigation: A N Jyothsna and Pamela Vinitha Eric; Supervision: Pamela Vinitha Eric; Writing- Reviewing and Editing: A N Jyothsna and Pamela Vinitha Eric; All authors reviewed the results and approved the final version of the manuscript.


Acknowledgements


The author(s) received no financial support for the research, authorship, and/or publication of this article.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


The Cropcbe is used in this research, collected from Soil test Laboratory Coimbatore. The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


A N Jyothsna and Pamela Vinitha Eric, “Signals and Syntax: Deep Learning for Emotional Intelligence and Text-Based Linguistic Proficiency”, Journal of Machine and Computing, vol.5, no.4, pp. 2808-2815, October 2025, doi: 10.53759/7669/jmc202505214.


Copyright


© 2025 A N Jyothsna and Pamela Vinitha Eric. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.