Human activity recognition (HAR) is an active research area in computer vision from past several years and research is still continuing in this field due to the unavailability of perfect recognition system. The human activity recognition system it covers e-health, patient monitoring, assistive daily living activities, video surveillance, security and behaviour analysis, and sports analysis. Many researchers have suggested techniques that use visual perception to detect human activities. Researchers will need to address problems including light variations in human activity detection, interclass similarity between scenes, the surroundings and recording setting, and temporal variation in order to construct an efficient vision-based human activity recognition system. However, a significant drawback of many deep learning models is their inability to achieve satisfactory results in real-world scenarios due to the conflicts mentioned above. To address this challenge, we developed a hybrid HAR-CNN classifier aimed at enhancing the learning outcomes of Deep CNNs by combining two models: Improved CNN and VGG-19. Using the KTH dataset, we collected 6,000 images for training, validation, and testing of our proposed technique. Our research findings indicate that the Hybrid HAR-CNN model, which combines Improved CNN with VGG-19 Net, outperforms individual deep learning models such as Improved CNN and VGG-19 Net.
Keywords
Human activity, Improved CNN, deep learning, activity recognition and artificial intelligence.
J. K. Aggarwal and M. S. Ryoo, “Human activity analysis,” ACM Computing Surveys, vol. 43, no. 3, pp. 1–43, Apr. 2011, doi: 10.1145/1922649.1922653.
K. K. Verma, B. M. Singh, and A. Dixit, “A review of supervised and unsupervised machine learning techniques for suspicious behavior recognition in intelligent surveillance system,” International Journal of Information Technology, vol. 14, no. 1, pp. 397–410, Sep. 2019, doi: 10.1007/s41870-019-00364-0.
S. Bosch, R. Marin-Perianu, P. Havinga, A. Horst, M. Marin-Perianu, and A. Vasilescu, “Automatic recognition of object use based on wireless motion sensors,” International Symposium on Wearable Computers (ISWC) 2010, Oct. 2010, doi: 10.1109/iswc.2010.5665858.
D. Metaxas and S. Zhang, “A review of motion analysis methods for human Nonverbal Communication Computing,” Image and Vision Computing, vol. 31, no. 6–7, pp. 421–433, Jun. 2013, doi: 10.1016/j.imavis.2013.03.005.
O. D. Lara and M. A. Labrador, “A Survey on Human Activity Recognition using Wearable Sensors,” IEEE Communications Surveys & Tutorials, vol. 15, no. 3, pp. 1192–1209, 2013, doi: 10.1109/surv.2012.110112.00192.
T. B. Moeslund and E. Granum, “A Survey of Computer Vision-Based Human Motion Capture,” Computer Vision and Image Understanding, vol. 81, no. 3, pp. 231–268, Mar. 2001, doi: 10.1006/cviu.2000.0897.
S. Althloothi, M. H. Mahoor, X. Zhang, and R. M. Voyles, “Human activity recognition using multi-features and multiple kernel learning,” Pattern Recognition, vol. 47, no. 5, pp. 1800–1812, May 2014, doi: 10.1016/j.patcog.2013.11.032.
W. Li, Z. Zhang, and Z. Liu, “Action recognition based on a bag of 3D points,” 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, Jun. 2010, doi: 10.1109/cvprw.2010.5543273.
M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign language recognition techniques,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 1, pp. 131–153, Aug. 2017, doi: 10.1007/s13042-017-0705-5.
L. Xia, C.-C. Chen, and J. K. Aggarwal, “View invariant human action recognition using histograms of 3D joints,” 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2012, doi: 10.1109/cvprw.2012.6239233.
A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257–267, Mar. 2001, doi: 10.1109/34.910878.
E. Shechtman and M. Irani, “Matching Local Self-Similarities across Images and Videos,” 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, doi: 10.1109/cvpr.2007.383198.
M. D. Rodriguez, J. Ahmed, and M. Shah, “Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2008, doi: 10.1109/cvpr.2008.4587727.
B. Chakraborty, M. B. Holte, T. B. Moeslund, and J. Gonzàlez, “Selective spatio-temporal interest points,” Computer Vision and Image Understanding, vol. 116, no. 3, pp. 396–410, Mar. 2012, doi: 10.1016/j.cviu.2011.09.010.
G. Willems, T. Tuytelaars, and L. Van Gool, “An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector,” Computer Vision – ECCV 2008, pp. 650–663, 2008, doi: 10.1007/978-3-540-88688-4_48.
A. Gilbert, J. Illingworth, and R. Bowden, “Action Recognition Using Mined Hierarchical Compound Features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 883–897, May 2011, doi: 10.1109/tpami.2010.144.
N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow and Appearance,” Lecture Notes in Computer Science, pp. 428–441, 2006, doi: 10.1007/11744047_33.
A. Gaidon, Z. Harchaoui, and C. Schmid, “Temporal Localization of Actions with Actoms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2782–2795, Nov. 2013, doi: 10.1109/tpami.2013.65.
C. Thurau and V. Hlaváč, “Recognizing Human Actions by Their Pose,” Statistical and Geometrical Approaches to Visual Motion Analysis, pp. 169–192, 2009, doi: 10.1007/978-3-642-03061-1_9.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.
N. Mohan et al., “Statistical Evaluation of Machining Parameters in Drilling of Glass Laminate Aluminum Reinforced Epoxy Composites using Machine Learning Model,” Engineered Science, 2022, doi: 10.30919/es8e716.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-Scale Video Classification with Convolutional Neural Networks,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, doi: 10.1109/cvpr.2014.223.
Joe Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, doi: 10.1109/cvpr.2015.7299101.
S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, Jan. 2013, doi: 10.1109/tpami.2012.59.
Z. Tu et al., “Multi-stream CNN: Learning representations based on human-related regions for action recognition,” Pattern Recognition, vol. 79, pp. 32–43, Jul. 2018, doi: 10.1016/j.patcog.2018.01.020.
S. Alam et al., “Effective sound detection system in commercial car vehicles using Msp430 launchpad development,” Multimedia Tools and Applications, May 2023, doi: 10.1007/s11042-023-15373-2.
https://www.csc.kth.se/cvap/actions/ in Proc. ICPR'04, Cambridge, UK. (2004).
Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition, Proceedings of the International Conference on Learning Representations (ICLR), pp. 1-8, (2015), doi: https://doi.org/10.48550/arXiv.1409.1556 .
Bradski, G, :The OpenCV Library”. Dr. Dobb's Journal of Software Tools, (2000).
C. R. Harris et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020, doi: 10.1038/s41586-020-2649-2.
J. D. Hunter, “Matplotlib: A 2D Graphics Environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/mcse.2007.55.
https://scikit-learn.org/stable/ (2011) Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
https://keras.io/. (2017) Keras: The python deep learning library.
https://www.tensorflow.org/. (2017) An open-source software library for machine intelligence.
“SINR Pricing in Non Cooperative Power Control Game for Wireless Ad Hoc Networks,” KSII Transactions on Internet and Information Systems, vol. 8, no. 7, Jul. 2014, doi: 10.3837/tiis.2014.07.005.
L. Bhagyalakshmi, S. K. Suman, and T. Sujeethadevi, “Joint Routing and Resource Allocation for Cluster Based Isolated Nodes in Cognitive Radio Wireless Sensor Networks,” Wireless Personal Communications, vol. 114, no. 4, pp. 3477–3488, Jun. 2020, doi: 10.1007/s11277-020-07543-4.
K. Mahalakshmi et al., “Public Auditing Scheme for Integrity Verification in Distributed Cloud Storage System,” Scientific Programming, vol. 2021, pp. 1–5, Dec. 2021, doi: 10.1155/2021/8533995.
S. K. Suman et al., “Detection and Prediction of HMS from Drinking Water by Analysing the Adsorbents from Residuals Using Deep Learning,” Adsorption Science & Technology, vol. 2022, Jan. 2022, doi: 10.1155/2022/3265366.
“Avoiding Energy Holes Problem using Load Balancing Approach in Wireless Sensor Network,” KSII Transactions on Internet and Information Systems, vol. 8, no. 5, pp. 1618–1637, May 2014, doi: 10.3837/tiis.2014.05.007.
S. Singh, S. V. Singh, D. Yadav, S. K. Suman, B. Lakshminarayanan, and G. Singh, “Discrete interferences optimum beamformer in correlated signal and interfering noise,” International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 2, p. 1732, Apr. 2022, doi: 10.11591/ijece.v12i2.pp1732-1743.
Acknowledgements
Authors thank Reviewers for taking the time and effort necessary to review the manuscript.
Funding
No funding was received to assist with the preparation of this manuscript.
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Availability of data and materials
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Author information
Contributions
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Corresponding author
Shakti Kundu
Shakti Kundu
School of Engineering and Technology, BML Munjal University, Kapriwas, Haryana, India.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Cite this article
Venugopal Rao A, Santosh Kumar Vishwakarma, Shakti Kundu and Varun Tiwari, “Hybrid HAR-CNN Model: A Hybrid Convolutional Neural Network Model for Predicting and Recognizing the Human Activity Recognition", pp. 419-430, April 2024. doi: 10.53759/7669/jmc202404040.