Study Finds Popular AI Chatbots Can Still Be Tricked by Known Jailbreak Methods

Keerthana S May 29, 2025| 04:30 PM Technology

Despite growing attention to AI safety, today’s popular chatbots remain alarmingly easy to manipulate — and the risks could be far greater than many realize.

Researchers from Ben Gurion University in Israel set out to test how secure large language models (LLMs) really are, especially when it comes to "jailbreaking" — the practice of tricking AI systems into bypassing built-in safety restrictions. What they uncovered is deeply concerning: even the most widely used models, including ChatGPT, still fall prey to old, well-known jailbreak methods — and major tech companies don’t appear to be doing enough to stop it.

Figure 1.Jailbreak Methods.

The team didn’t create any new attack vectors. Instead, they applied a previously documented jailbreak technique that had been publicly circulating for months. When tested on several mainstream AI systems, the method worked — and worked easily. The filters meant to block harmful or illegal content crumbled, and the chatbots readily provided instructions on topics like fraud, bomb-making, and other dangerous activities. In some cases, they even volunteered additional information without being prompted. Figure 1 shows Jailbreak Methods.

This wasn’t an isolated incident. The researchers developed a more general jailbreak strategy that worked across nearly all the AI platforms they tested. The consistency of the results pointed to a systemic issue. Worse still, when the team contacted the companies responsible for these models, most offered no meaningful response. A few downplayed the issue or claimed it wasn’t their responsibility — all while the vulnerabilities remained unfixed.

The situation is even more troubling with open-source AI models. Unlike corporate-owned models that can be patched or pulled, open-source versions — once released — are out in the wild for good. Once downloaded, they can be duplicated, shared, and modified with no way to recall them. Some of these “uncensored” models are now actively promoted for their willingness to engage in unethical or illegal behavior. They don’t require specialized hardware — a typical laptop is enough to run them, meaning they’re accessible to virtually anyone, including teenagers.

What was once a fringe hobby has become a growing underground trend. Online communities have emerged where users trade tips and jailbreak prompts to bypass safety systems — often treating it like a challenge or game. One subreddit alone boasts over 140,000 members dedicated to sharing strategies to trick AI into generating restricted content [1]. These chatbots don’t need much persuasion — just the right prompt can unlock responses they were explicitly designed to avoid.

And that’s where the real threat lies. If a teenager can jailbreak a chatbot in under a minute, what could a malicious actor or extremist group accomplish?

Some defensive strategies are being explored — such as AI firewalls, content filtering, and post-training methods to erase harmful knowledge — but they’re far from widely adopted. A few companies are testing tools to catch harmful prompts in real-time, but there’s no universal solution, and these safeguards often lag behind the latest jailbreak techniques.

At its core, this issue isn’t just technical — it’s cultural. The AI race is moving fast, and safety often takes a back seat. Companies compete to be first to market, while critical vulnerabilities are left unresolved.

There’s no denying these systems can be incredibly valuable — aiding in education, software development, healthcare, and more. But when they begin handing out instructions for criminal activities, that usefulness turns into a liability.

References
  1. https://www.digitalinformationworld.com/2025/05/study-shows-popular-ai-chatbots-easily.html

Cite this article:

Keerthana S (2025), Study Finds Popular AI Chatbots Can Still Be Tricked by Known Jailbreak Methods, AnaTechMaz, pp.693

Recent Post

Blog Archive