2nd International Conference on Materials Science and Sustainable Manufacturing Technology
Recommendation of Music Based on Facial Emotion using Machine Learning Technique
G. Sakthi Priya, A. Evangelin Blessy, S. Jeya Aravinth, M. Vignesh Prabhu, R. VijayaSarathy, Department of CSE, Ramco Institute of Technology, Rajapalayam, TamilNadu, India.
Music plays a vital role in human life, and it is a valid therapy to potentially reduce depression, anxiety, as well as
to improve mood, self-esteem, and quality of life. Music has the power to change human emotion as expressed through facial
expression. It’s a difficult task to recommend music based on emotion. The existing system on emotion recognition and music
recommendation is focused on depression and mental health analysis. Hence a model is proposed to recommend music based
on recognition of face expression to improve or change the emotion. Face emotion recognition (FER) is implemented using
YoloV5 algorithm. The output of FER is a type of emotion classified as happy, anger, sad, and neutral which is the input to
music recommendation system. A Music player is created to keep track of the user’s favorite based on the emotion. If the user
is new to the system, then generalized music will be suggested. The aim of the paper is to recommend music to the user
according to their emotion to further improve it.
Keywords
Face Emotion Recognition, Music Recommendation System, Yolov5, Machine Learning Technique, Emotion
Classification, Songs.
Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "Yolov4: Optimal speed and accuracy of object detection." arXiv preprint arXiv:2004,10934 (2020).
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, doi: 10.1109/cvpr.2016.91.
W. Liu et al., “SSD: Single Shot MultiBox Detector,” Lecture Notes in Computer Science, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2.
J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, doi: 10.1109/cvpr.2017.690.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, doi: 10.1109/cvpr.2014.81.
Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, doi: 10.1109/cvpr42600.2020.01155.
Park, Jongchan, Sanghyun Woo, Joon-Young Lee, and In So Kweon. "Bam: Bottleneck attention module." arXiv preprint arXiv:1807.06514 (2018).
Ferwerda, Bruce, and Markus Schedl. "Enhancing Music Recommender Systems with Personality Information and Emotional States: A Proposal." In Umap workshops. 2014.
L. Cai, Y. Hu, J. Dong, and S. Zhou, “Audio-Textual Emotion Recognition Based on Improved Neural Networks,” Mathematical Problems in Engineering, vol. 2019, pp. 1–9, Dec. 2019, doi: 10.1155/2019/2593036.
B. J. Abbaschian, D. Sierra-Sosa, and A. Elmaghraby, “Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models,” Sensors, vol. 21, no. 4, p. 1249, Feb. 2021, doi: 10.3390/s21041249.
P. Tzirakis, J. Zhang, and B. W. Schuller, “End-to-End Speech Emotion Recognition Using Deep Neural Networks,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2018, doi: 10.1109/icassp.2018.8462677.
A. Chowanda, R. Sutoyo, Meiliana, and S. Tanachutiwat, “Exploring Text-based Emotions Recognition Machine Learning Techniques on Social Media Conversation,” Procedia Computer Science, vol. 179, pp. 821–828, 2021, doi: 10.1016/j.procs.2021.01.099.
M. Lech, M. Stolar, C. Best, and R. Bolia, “Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding,” Frontiers in Computer Science, vol. 2, May 2020, doi: 10.3389/fcomp.2020.00014.
W. Lim, D. Jang, and T. Lee, “Speech emotion recognition using convolutional and Recurrent Neural Networks,” 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Dec. 2016, doi: 10.1109/apsipa.2016.7820699.
D. Tang, P. Kuppens, L. Geurts, and T. van Waterschoot, “End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2021, no. 1, May 2021, doi: 10.1186/s13636-021-00208-5.
T.-W. Sun, “End-to-End Speech Emotion Recognition with Gender Information,” IEEE Access, vol. 8, pp. 152423–152438, 2020, doi:10.1109/access.2020.3017462.
R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar, and T. Alhussain, “Speech Emotion Recognition Using Deep Learning Techniques: A Review,” IEEE Access, vol. 7, pp. 117327–117345, 2019, doi: 10.1109/access.2019.2936124.
S. Zhang, S. Zhang, T. Huang, and W. Gao, “Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching,” IEEE Transactions on Multimedia, vol. 20, no. 6, pp. 1576–1590, Jun. 2018, doi: 10.1109/tmm.2017.2766843.
R. Ranjan et al., “A Fast and Accurate System for Face Detection, Identification, and Verification,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 1, no. 2, pp. 82–96, Apr. 2019, doi: 10.1109/tbiom.2019.2908436.
W.-Y. Hsu and W.-Y. Lin, “Ratio-and-Scale-Aware YOLO for Pedestrian Detection,” IEEE Transactions on Image Processing, vol. 30, pp. 934–947, 2021, doi: 10.1109/tip.2020.3039574.
Cite this article
G. Sakthi Priya, A. Evangelin Blessy, S. Jeya Aravinth, M. Vignesh Prabhu, R. VijayaSarathy, “Recommendation of Music Based on Facial Emotion using Machine Learning Technique”, Advances in Computational Intelligence in Materials Science, pp. 102-110, May. 2023. doi:10.53759/acims/978-9914-9946-9-8_16