Journal of Machine and Computing


Clickbait Detection for Amharic Language using Deep Learning Techniques



Journal of Machine and Computing

Received On : 12 October 2023

Revised On : 22 March 2024

Accepted On : 08 May 2024

Volume 04, Issue 03


Article Views

Abstract


Because of, the increasing number of Ethiopians who actively engaging with the Internet and social media platforms, the incidence of clickbait is becomes a significant concern. Clickbait, often utilizing enticing titles to tempt users into clicking, has become rampant for various reasons, including advertising and revenue generation. However, the Amharic language, spoken by a large population, lacks sufficient NLP resources for addressing this issue. In this study, the authors developed a machine learning model for detecting and classifying clickbait titles in Amharic Language. To facilitate this, authors prepared the first Amharic clickbait dataset. 53,227 social media posts from well-known sites including Facebook, Twitter, and YouTube are included in the dataset. To assess the impact of conventional machine learning methods like Random Forest (RF), Logistic Regression (LR), and Support Vector Machines (SVM) with TF-IDF and N-gram feature extraction approaches, the authors set up a baseline. Subsequently, the authors investigated the efficacy of two word embedding techniques, word2vec and fastText, with Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) deep learning algorithms. At 94.27% accuracy and 94.24% F1 score measure, the CNN model with the rapid Text word embedding performs the best compared to the other models, according to the testing data. The study advances natural language processing on low-resource languages and offers insightful advice on how to counter clickbait content in Amharic.


Keywords


Clickbait Detection, Artificial Neural Networks, Natural Language Processing, Machine Learning Techniques, Deep Learning Techniques, Amharic Language, social media.


  1. G. Loewenstein, “The psychology of curiosity: A review and reinterpretation.,” Psychol Bull, vol. 116, no. 1, pp. 75–98, Jul. 1994, doi: 10.1037/0033-2909.116.1.75.
  2. J. Fu, L. Liang, X. Zhou, and J. Zheng, “A Convolutional Neural Network for Clickbait Detection,” in 2017 4th International Conference on Information Science and Control Engineering (ICISCE), 2017, pp. 6–10. doi: 10.1109/ICISCE.2017.11.
  3. M. Al-Sarem et al., “An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks,” Applied Sciences 2021, Vol. 11, Page 9487, vol. 11, no. 20, p. 9487, Oct. 2021, doi: 10.3390/APP11209487.
  4. B. Naeem, M. Beg, H. Mujtaba, A. Khan, · Mirza, and O. Beg, “A deep learning framework for clickbait detection on social area network using natural language cues,” Springer, vol. 3, no. 1, pp. 231–243, Apr. 2020, doi: 10.1007/s42001-020-00063-y.
  5. C. Zhang and P. D. Clough, “Investigating clickbait in Chinese social media: A study of WeChat,” Online Soc Netw Media, vol. 19, p. 100095, Sep. 2020, doi: 10.1016/J.OSNEM.2020.100095.
  6. P. Mowar, M. Jain, R. Goel, and D. K. Vishwakarma, “Clickbait in YouTube Prevention, Detection and Analysis of the Bait using Ensemble Learning,” arXiv preprint arXiv:2112.08611, 2021.
  7. P. Klairith and S. Tanachutiwat, “Thai clickbait detection algorithms using natural language processing with machine learning techniques,” ICEAST 2018 - 4th International Conference on Engineering, Applied Sciences and Technology: Exploring Innovative Solutions for Smart Society, Aug. 2018, doi: 10.1109/ICEAST.2018.8434447.
  8. I. N. Awol and S. M. Gashaw, “Lexicon-Stance Based Amharic Fake News Detection,” researchgate.net, May 2022, Accessed: May 10, 2023. [Online]. Available: https://www.researchgate.net/profile/Ibrahim-Awol/publication/369203279_Lexicon-Stance_Based_Amharic_Fake_News_Detection/links/64105d84a1b72772e4f9308a/Lexicon-Stance-Based-Amharic-Fake-News-Detection.pdf
  9. F. Gereme, W. Zhu, T. Ayall, and D. Alemu, “Combating Fake News in ‘Low-Resource’ Languages: Amharic Fake News Detection Accompanied by Resource Crafting,” Information 2021, Vol. 12, Page 20, vol. 12, no. 1, p. 20, Jan. 2021, doi: 10.3390/INFO12010020.
  10. I. Zitouni, Natural language processing of semitic languages. Berlin: Springer, 2014. Accessed: May 16, 2023. [Online]. Available: https://link.springer.com/content/pdf/10.1007/978-3-642-45358-8.pdf
  11. Y. Chen, N. J. Conroy, and V. L. Rubin, “Misleading online content: Recognizing clickbait as ‘false news,’” WMDD 2015 - Proceedings of the ACM Workshop on Multimodal Deception Detection, co-located with ICMI 2015, pp. 15–19, Nov. 2015, doi: 10.1145/2823465.2823467.
  12. A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly, “Stop Clickbait: Detecting and preventing clickbaits in online news media,” Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, pp. 9–16, Nov. 2016, doi: 10.1109/ASONAM.2016.7752207.
  13. A. Geçkil, Müngen, A. A., E. Gündogan, and M. Kaya, “A clickbait detection method on news sites,” IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 932–937, Aug. 2018.
  14. M. Potthast, S. Köpsel, B. Stein, and M. Hagen, “Clickbait Detection,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9626, pp. 810–817, 2016, doi: 10.1007/978-3-319-30671-1_72.
  15. P. K. Dimpas, R. V. Po, and M. J. Sabellano, “Filipino and english clickbait detection using a long short term memory recurrent neural network,” Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017, vol. 2018-January, pp. 276–280, Feb. 2018, doi: 10.1109/IALP.2017.8300597.
  16. S. Manjesh, T. Kanakagiri, P. Vaishak, V. Chettiar, and G. Shobha, “Clickbait Pattern Detection and Classification of News Headlines Using Natural Language Processing,” 2nd International Conference on Computational Systems and Information Technology for Sustainable Solutions, CSITSS 2017, pp. 1–5, Aug. 2017, doi: 10.1109/CSITSS.2017.8447715. Authors Pre-Proof
  17. Bantelay, Lidia Mekuanint, et al. "Heuristic Pneumonia and Tuberculosis Detection in X-Ray Images Using Convolutional Neural Networks." 2023 Annual International Conference on Emerging Research Areas: International Conference on Intelligent Systems (AICERA/ICIS). IEEE, 2023.
  18. H. T. Zheng, J. Y. Chen, X. Yao, A. K. Sangaiah, Y. Jiang, and C. Z. Zhao, “Clickbait Convolutional Neural Network,” Symmetry 2018, Vol. 10, Page 138, vol. 10, no. 5, p. 138, May 2018, doi: 10.3390/SYM10050138.
  19. A. Agrawal, “Clickbait detection using deep learning,” Proceedings on 2016 2nd International Conference on Next Generation Computing Technologies, NGCT 2016, pp. 268–272, Mar. 2017, doi: 10.1109/NGCT.2016.7877426.
  20. A. Anand, T. Chakraborty, and N. Park, “We used neural networks to detect clickbaits: You won’t believe what happened next!,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10193 LNCS, pp. 541–547, 2017, doi: 10.1007/978-3-319-56608-5_46/COVER.
  21. Ali Nur, Mukerem, Mesfin Abebe, and Rajesh Sharma Rajendran. "Handwritten Geez Digit Recognition Using Deep Learning." Applied Computational Intelligence and Soft Computing 2022 (2022).
  22. Sharma, R., Sungheetha, A., & Nuradis, J. Brain Tumor Classification by EGSO Based RBFNN Classifier.
  23. M. Marreddy, S. R. Oota, L. S. Vakada, V. C. Chinni, and R. Mamidi, “Clickbait Detection in Telugu: Overcoming NLP Challenges in Resource-Poor Languages using Benchmarked Techniques,” Proceedings of the International Joint Conference on Neural Networks, vol. 2021-July, Jul. 2021, doi: 10.1109/IJCNN52387.2021.9534382.
  24. M. N. Fakhruzzaman and S. W. Gunawan, “Web-based Application for Detecting Indonesian Clickbait Headlines using IndoBERT,” Feb. 2021, doi: 10.48550/arxiv.2102.10601.
  25. Tilahun, Efa, et al. "Culture Reflecting Artistic Fashion Design using Deep Learning and Assisting Custom Algorithm." 2023 International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS). Vol. 1. IEEE, 2023.
  26. W. Kelemework, “Automatic Amharic text news classification: A neural networks approach,” Ethiopian Journal of Science and Technology, vol. 6, no. 2, pp. 127–137, 2013, Accessed: May 17, 2023. [Online]. Available: https://www.ajol.info/index.php/ejst/article/view/117217
  27. S. M. Yimam, H. M. Alemayehu, A. A. Ayele, and C. Biemann, “Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models,” pp. 1048–1060, Jan. 2020, doi: 10.18653/V1/2020.COLING-MAIN.91.
  28. E. N. Hailemichael, “Fake news detection for amharic language using deep learning,” academia.edu, 2021, Accessed: May 17, 2023. [Online]. Available: https://www.academia.edu/download/84664801/ERMIAS_20NIGATU.pdf
  29. Sharma, Rajesh, P. Marikkannu, and Akey Sungheetha. "Three-dimensional MRI brain tumour classification using hybrid ant colony optimisation and grey wolf optimiser with proximal support vector machine." International Journal of Biomedical Engineering and Technology 29.1 (2019): 34-45.
  30. B. Gambäck, F. Olsson, A. Argaw, and L. Asker, “Methods for Amharic part-of-speech tagging,” First Workshop on Language Technologies for African Languages, Mar. 2009, Accessed: May 17, 2023. [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2:1042595
  31. Kiran, Chitra, et al. "Cyber Physical System Centred Protective Laboratory for Industries." International Conference on Microelectronics, Electromagnetics and Telecommunication. Singapore: Springer Nature Singapore, 2023.
  32. X. Cao, T. Le, J. ( Jiasheng, ) Zhang, and D. Lee, “Machine Learning Based Detection of Clickbait Posts in Social Media,” Oct. 2017, Accessed: Apr. 06, 2023. [Online]. Available: https://arxiv.org/abs/1710.01977v1
  33. P. Adelson, S. Arora, and J. Hara, “Clickbait; Didn’t Read: Clickbait Detection using Parallel Neural Networks,” 2017, Accessed: May 16, 2023. [Online]. Available: http://cs229.stanford.edu/proj2017/final-reports/5231575.pdf
  34. K. Shu, S. Wang, T. Le, D. Lee, and H. Liu, “Deep Headline Generation for Clickbait Detection,” Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2018-November, pp. 467–476, Dec. 2018, doi: 10.1109/ICDM.2018.00062.
  35. Sharma, R. Rajesh, and P. Marikkannu. "Hybrid RGSA and support vector machine framework for three-dimensional magnetic resonance brain tumor classification." The Scientific World Journal 2015 (2015).
  36. Z. Abebaw, A. Rauber, and S. Atnafu, “Multi-channel Convolutional Neural Network for Hate Speech Detection in Social Media,” Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, vol. 411 LNICST, pp. 603–618, 2022, doi: 10.1007/978-3-030-93709-6_41

Acknowledgements


The author(s) received no financial support for the research, authorship, and/or publication of this article.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Rajesh Sharma R, Akey Sungheetha, Mesfin Abebe Haile, Arefat Hyeredin Kedir, Rajasekaran A, Charles Babu G, “Clickbait Detection for Amharic Language using Deep Learning Techniques”, Journal of Machine and Computing, doi: 10.53759/7669/jmc202404058.


Copyright


© 2024 Rajesh Sharma R, Akey Sungheetha, Mesfin Abebe Haile, Arefat Hyeredin Kedir, Rajasekaran A, Charles Babu G. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.