In today’s world, digital platforms have witnessed an explosion in the digital conversations and are not straightforward. A significant contributor to this complexity is the use of subtle references to another context or with encoded texts. These are said to be Mnemonics appearing in the form of Abbreviations, Numeronymns, Symbolic representations, Emoji-based codes, Leetspeak etc.., in everyday communication. There are various types of mnemonics used in online conversations, which include phonetic substitutions (eg. Gr8 for ‘great’), numerical encoding (e.g., 143 for ‘I love you’), and symbolic representations (with emojis and icons), abbreviations (“LOL” for Laugh Out Loud) etc., This linguistic creativity is not only a tool for memory and efficiency, but also a growing challenge for automated moderation and content understanding systems, as mnemonics often encode non-explicit, sensitive, or policy-relevant meanings that typical keyword-based approaches might fail to identify. To address this gap, we introduce a Content Moderation Model, which is a large language model (LLM) based pipeline that systematically detects, categorizes, and deciphers both general and context-specific mnemonic constructs within user-generated text. This methodology builds upon advances in deep learning, leveraging the representational power and semantic flexibility of models such as GPT-4.1, known for their success in complex linguistic and content analysis tasks across domains. This framework uses a corpus of both harmless and sexually-coded user-generated texts to identify mnemonic patterns such as Phonetic substitutions, Emoji usage, and Leetspeak. The system accurately flags and classifies mnemonic types, enabling improved moderation, linguistic analysis, and platform policy design. The outcomes—quantified through rigorous empirical validation, demonstrates substantial improvements in identifying and decoding diverse mnemonic forms. These findings provide actionable insights for platform policy, and the design of more accessible, inclusive communication systems that acknowledge both the benefits and risks of mnemonic language.
Numeronyms, Emojis, Mnemonics, Phonetic Substitutions, Abbreviations, Leetspeak, Large Language Models, Content Moderation Model, Prompting, Zero Shot Learning, FewShot Learning.
The authors confirm contribution to the paper as follows:
Conceptualization: Sumithra S and Sujatha P; Writing- Original Draft Preparation: Sumithra S and Sujatha P; Supervision: Sumithra S; Validation: Sujatha P; Writing- Reviewing and Editing: Sumithra S and Sujatha P; All authors reviewed the results and approved the final version of the manuscript.
Author(s) thanks to Dr. Sujatha P for this research completion and support.
No funding was received to assist with the preparation of this manuscript.
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Data sharing is not applicable to this article as no new data were created or analysed in this study.
All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.
Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
Sumithra S and Sujatha P, “Unmasking Mnemonics – Leveraging Content Moderation Model for Decoding Encoded Communication in Digital Conversations”, Journal of Machine and Computing, vol.5, no.4, pp. 2292-2304, October 2025, doi: 10.53759/7669/jmc202505178.
© 2025 Sumithra S and Sujatha P. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.