The current interactive systems do not consider the emotional aspect of the users and thus the interface is very rigid such that dynamic human behaviors cannot be accommodated. This shortcoming presents a huge gap in customized user experience, particularly in applications with an educational and healthcare theme, intelligent assistants, and customer service. In response to this issue, a Multimodal Emotion-Aware Interaction Framework, called E-MXNet is proposed in the present work and it is aimed at interpreting user emotions, based on audio, visual and textual feedbacks, and adjusting system responses in real time. The framework has a combination of modality-specific feature extractors with a hybrid fusion strategy, that trades early feature-level integration and late decision-level aggregation, and that allows the representation of emotions to be resilient in noisy or incomplete modality conditions. An emotion-sensitive personalized interaction engine is used to further customize interface features such as content style, interaction speed, and modality of feedback to increase the degree of engagement. E-MXNet is novel in three ways: (1) it has a unified multimodal affective pipeline, which combines speech prosody, facial dynamics, and semantic sentiment; (2) the fusion mechanism works well in a wide-range of contexts; and (3) it has an adaptive user-experience module, which introduces scalable emotional personalization. Evaluations on benchmark datasets show that E-MXNet is more accurate, has better F1-scores, and lower misclassification rates than current unimodal and multimodal baselines. The analysis based on visualizations also indicates that after the personalization, the model stability and user satisfaction increase significantly. These findings underscore the usefulness of E-MXNet in providing emotionally intelligent and context-sensitive interaction experiences.
Keywords
Multimodal Emotion Recognition, Affective Computing, Personalized Interaction, Adaptive User Experience, Human–Computer Interaction, Deep Learning Framework.
J. Sun, “Research on the Design of Intelligent Voice Interaction System Based on Affective Computing,” International Scientific Technical and Economic Research, pp. 1–15, Jul. 2025, doi: 10.71451/istaer2535.
S. Duan, Z. Wang, S. Wang, M. Chen, and R. Zhang, “Emotion-Aware Interaction Design in Intelligent User Interface Using Multi-Modal Deep Learning,” 2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC), pp. 110–114, Nov. 2024, doi: 10.1109/isceic63613.2024.10810240.
S. Haddad, O. Daassi, and S. Belghith, “Emotion-Aware Interfaces: Empirical Methods for Adaptive User Interface,” Computer-Human Interaction Research and Applications, pp. 147–165, 2025, doi: 10.1007/978-3-031-82633-7_10.
R. Dang and N. Abd Samad, “Research on a virtual teacher personalized interaction model integrating affective computing and multi-agent systems,” Future Technology, vol. 4, no. 4, pp. 159–172, Nov. 2025, doi: 10.55670/fpll.futech.4.4.14.
L. Hu, W. Li, J. Yang, G. Fortino, and M. Chen, “A Sustainable Multi-Modal Multi-Layer Emotion-Aware Service at the Edge,” IEEE Transactions on Sustainable Computing, vol. 7, no. 2, pp. 324–333, Apr. 2022, doi: 10.1109/tsusc.2019.2928316.
A. Altieri, S. Ceccacci, and M. Mengoni, “Emotion-Aware Ambient Intelligence: Changing Smart Environment Interaction Paradigms Through Affective Computing,” Distributed, Ambient and Pervasive Interactions, pp. 258–270, 2019, doi: 10.1007/978-3-030-21935-2_20.
S. Caballe, “Towards a Multi-modal Emotion-Awareness e-Learning System,” 2015 International Conference on Intelligent Networking and Collaborative Systems, pp. 280–287, Sep. 2015, doi: 10.1109/incos.2015.88.
Z. Kh. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,” IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/access.2022.3223444.
M. Sharmila Kumari and B. A. Mudalawar, “Local Binary Pattern and Block Based Local Binary Pattern for Face Recognition:An Empirical Study,” Proceedings of Second International Conference on Signal Processing, Image Processing and VLSI, pp. 792–798, 2015, doi: 10.3850/978-981-09-6200-5_o-86.
B. Yang, X. Luo, K. Sun, and M. Y. Luo, “Recent Progress on Text Summarisation Based on BERT and GPT,” Knowledge Science, Engineering and Management, pp. 225–241, 2023, doi: 10.1007/978-3-031-40292-0_19.
G. T. Waleed and S. H. Shaker, “Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN,” Information, vol. 16, no. 7, p. 518, Jun. 2025, doi: 10.3390/info16070518.
C. Aurelio and A. Chowanda, “Using CNN and Transformer Model for Unimodal Speech Emotion Recognition on MELD and IEMOCAP,” 2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), pp. 1–7, Feb. 2025, doi: 10.1109/icadeis65852.2025.10933408.
C. Gupta et al., “A multimodal fusion model for real-time environment emotion recognition using audio-visual-textual features,” Journal of Big Data, vol. 12, no. 1, Nov. 2025, doi: 10.1186/s40537-025-01300-9.
Z. Ding, Y. Ji, Y. Gan, Y. Wang, and Y. Xia, “Current status and trends of technology, methods, and applications of Human–Computer Intelligent Interaction (HCII): A bibliometric research,” Multimedia Tools and Applications, vol. 83, no. 27, pp. 69111–69144, Jan. 2024, doi: 10.1007/s11042-023-18096-6.
H. Wang et al., “High‐Performance Hydrogel Sensors Enabled Multimodal and Accurate Human–Machine Interaction System for Active Rehabilitation,” Advanced Materials, vol. 36, no. 11, Dec. 2023, doi: 10.1002/adma.202309868.
J. Patel, J. Banerjee, and D. Singh, “AI-Driven Emotion-Aware Adaptive Systems for Enhancing Real-Time User Engagement,” 2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 1690–1695, Sep. 2025, doi: 10.1109/icimia67127.2025.11200798.
P. Singh Tomar, K. Mathur, and U. Suman, “Unimodal approaches for emotion recognition: A systematic review,” Cognitive Systems Research, vol. 77, pp. 94–109, Jan. 2023, doi: 10.1016/j.cogsys.2022.10.012.
J. A. Witten, R. Coetzer, L. Rowlands, and O. H. Turnbull, “‘Talk and Chalk’: An emotion regulation intervention for anger after acquired brain injury,” Applied Neuropsychology: Adult, vol. 32, no. 4, pp. 928–943, Jun. 2023, doi: 10.1080/23279095.2023.2224481.
M. Alex, B. C. Wünsche, and D. Lottridge, “Virtual reality art-making for stroke rehabilitation: Field study and technology probe,” International Journal of Human-Computer Studies, vol. 145, p. 102481, Jan. 2021, doi: 10.1016/j.ijhcs.2020.102481.
CRediT Author Statement
The author reviewed the results and approved the final version of the manuscript.
Acknowledgements
Author(s) thanks to Saint Petersburg State University for research support.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
The datasets used in this study consist of publicly available multimodal emotion recognition corpora and synthetically generated supplementary data for personalization experiments. The audiovisual components were derived from standard benchmark datasets such as IEMOCAP, RAVDESS, and CREMA-D, all of which can be accessed through their respective official repositories. Textual data associated with emotional speech was sourced from transcriptions provided within these datasets.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Boyu Ren
School of Design, Saint Petersburg State University, 199034, Russia.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Boyu Ren, “Emotion Aware Interaction Systems: A Multimodal Affective Computing Framework for Immersive Technology Powered Personalized User Experience”, Journal of Machine and Computing, vol.6, no.1, pp. 367-384, 2026, doi: 10.53759/7669/jmc202606027.