Journal of Machine and Computing


Emotion Aware Interaction Systems: A Multimodal Affective Computing Framework for Immersive Technology Powered Personalized User Experience



Journal of Machine and Computing

Received On : 10 August 2025

Revised On : 30 October 2025

Accepted On : 02 December 2025

Published On : 08 December 2025

Volume 06, Issue 01

Pages : 367-384


Abstract


The current interactive systems do not consider the emotional aspect of the users and thus the interface is very rigid such that dynamic human behaviors cannot be accommodated. This shortcoming presents a huge gap in customized user experience, particularly in applications with an educational and healthcare theme, intelligent assistants, and customer service. In response to this issue, a Multimodal Emotion-Aware Interaction Framework, called E-MXNet is proposed in the present work and it is aimed at interpreting user emotions, based on audio, visual and textual feedbacks, and adjusting system responses in real time. The framework has a combination of modality-specific feature extractors with a hybrid fusion strategy, that trades early feature-level integration and late decision-level aggregation, and that allows the representation of emotions to be resilient in noisy or incomplete modality conditions. An emotion-sensitive personalized interaction engine is used to further customize interface features such as content style, interaction speed, and modality of feedback to increase the degree of engagement. E-MXNet is novel in three ways: (1) it has a unified multimodal affective pipeline, which combines speech prosody, facial dynamics, and semantic sentiment; (2) the fusion mechanism works well in a wide-range of contexts; and (3) it has an adaptive user-experience module, which introduces scalable emotional personalization. Evaluations on benchmark datasets show that E-MXNet is more accurate, has better F1-scores, and lower misclassification rates than current unimodal and multimodal baselines. The analysis based on visualizations also indicates that after the personalization, the model stability and user satisfaction increase significantly. These findings underscore the usefulness of E-MXNet in providing emotionally intelligent and context-sensitive interaction experiences.


Keywords


Multimodal Emotion Recognition, Affective Computing, Personalized Interaction, Adaptive User Experience, Human–Computer Interaction, Deep Learning Framework.


  1. J. Sun, “Research on the Design of Intelligent Voice Interaction System Based on Affective Computing,” International Scientific Technical and Economic Research, pp. 1–15, Jul. 2025, doi: 10.71451/istaer2535.
  2. S. Duan, Z. Wang, S. Wang, M. Chen, and R. Zhang, “Emotion-Aware Interaction Design in Intelligent User Interface Using Multi-Modal Deep Learning,” 2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC), pp. 110–114, Nov. 2024, doi: 10.1109/isceic63613.2024.10810240.
  3. S. Haddad, O. Daassi, and S. Belghith, “Emotion-Aware Interfaces: Empirical Methods for Adaptive User Interface,” Computer-Human Interaction Research and Applications, pp. 147–165, 2025, doi: 10.1007/978-3-031-82633-7_10.
  4. R. Dang and N. Abd Samad, “Research on a virtual teacher personalized interaction model integrating affective computing and multi-agent systems,” Future Technology, vol. 4, no. 4, pp. 159–172, Nov. 2025, doi: 10.55670/fpll.futech.4.4.14.
  5. L. Hu, W. Li, J. Yang, G. Fortino, and M. Chen, “A Sustainable Multi-Modal Multi-Layer Emotion-Aware Service at the Edge,” IEEE Transactions on Sustainable Computing, vol. 7, no. 2, pp. 324–333, Apr. 2022, doi: 10.1109/tsusc.2019.2928316.
  6. A. Altieri, S. Ceccacci, and M. Mengoni, “Emotion-Aware Ambient Intelligence: Changing Smart Environment Interaction Paradigms Through Affective Computing,” Distributed, Ambient and Pervasive Interactions, pp. 258–270, 2019, doi: 10.1007/978-3-030-21935-2_20.
  7. S. Caballe, “Towards a Multi-modal Emotion-Awareness e-Learning System,” 2015 International Conference on Intelligent Networking and Collaborative Systems, pp. 280–287, Sep. 2015, doi: 10.1109/incos.2015.88.
  8. Z. Kh. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,” IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/access.2022.3223444.
  9. M. Sharmila Kumari and B. A. Mudalawar, “Local Binary Pattern and Block Based Local Binary Pattern for Face Recognition:An Empirical Study,” Proceedings of Second International Conference on Signal Processing, Image Processing and VLSI, pp. 792–798, 2015, doi: 10.3850/978-981-09-6200-5_o-86.
  10. B. Yang, X. Luo, K. Sun, and M. Y. Luo, “Recent Progress on Text Summarisation Based on BERT and GPT,” Knowledge Science, Engineering and Management, pp. 225–241, 2023, doi: 10.1007/978-3-031-40292-0_19.
  11. G. T. Waleed and S. H. Shaker, “Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN,” Information, vol. 16, no. 7, p. 518, Jun. 2025, doi: 10.3390/info16070518.
  12. C. Aurelio and A. Chowanda, “Using CNN and Transformer Model for Unimodal Speech Emotion Recognition on MELD and IEMOCAP,” 2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), pp. 1–7, Feb. 2025, doi: 10.1109/icadeis65852.2025.10933408.
  13. C. Gupta et al., “A multimodal fusion model for real-time environment emotion recognition using audio-visual-textual features,” Journal of Big Data, vol. 12, no. 1, Nov. 2025, doi: 10.1186/s40537-025-01300-9.
  14. Z. Ding, Y. Ji, Y. Gan, Y. Wang, and Y. Xia, “Current status and trends of technology, methods, and applications of Human–Computer Intelligent Interaction (HCII): A bibliometric research,” Multimedia Tools and Applications, vol. 83, no. 27, pp. 69111–69144, Jan. 2024, doi: 10.1007/s11042-023-18096-6.
  15. H. Wang et al., “High‐Performance Hydrogel Sensors Enabled Multimodal and Accurate Human–Machine Interaction System for Active Rehabilitation,” Advanced Materials, vol. 36, no. 11, Dec. 2023, doi: 10.1002/adma.202309868.
  16. J. Patel, J. Banerjee, and D. Singh, “AI-Driven Emotion-Aware Adaptive Systems for Enhancing Real-Time User Engagement,” 2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 1690–1695, Sep. 2025, doi: 10.1109/icimia67127.2025.11200798.
  17. P. Singh Tomar, K. Mathur, and U. Suman, “Unimodal approaches for emotion recognition: A systematic review,” Cognitive Systems Research, vol. 77, pp. 94–109, Jan. 2023, doi: 10.1016/j.cogsys.2022.10.012.
  18. J. A. Witten, R. Coetzer, L. Rowlands, and O. H. Turnbull, “‘Talk and Chalk’: An emotion regulation intervention for anger after acquired brain injury,” Applied Neuropsychology: Adult, vol. 32, no. 4, pp. 928–943, Jun. 2023, doi: 10.1080/23279095.2023.2224481.
  19. M. Alex, B. C. Wünsche, and D. Lottridge, “Virtual reality art-making for stroke rehabilitation: Field study and technology probe,” International Journal of Human-Computer Studies, vol. 145, p. 102481, Jan. 2021, doi: 10.1016/j.ijhcs.2020.102481.

CRediT Author Statement


The author reviewed the results and approved the final version of the manuscript.


Acknowledgements


Author(s) thanks to Saint Petersburg State University for research support.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


The datasets used in this study consist of publicly available multimodal emotion recognition corpora and synthetically generated supplementary data for personalization experiments. The audiovisual components were derived from standard benchmark datasets such as IEMOCAP, RAVDESS, and CREMA-D, all of which can be accessed through their respective official repositories. Textual data associated with emotional speech was sourced from transcriptions provided within these datasets.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Boyu Ren, “Emotion Aware Interaction Systems: A Multimodal Affective Computing Framework for Immersive Technology Powered Personalized User Experience”, Journal of Machine and Computing, vol.6, no.1, pp. 367-384, 2026, doi: 10.53759/7669/jmc202606027.


Copyright


© 2026 Boyu Ren. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.