Hackers Are Using the Same Conversational Tricks on AI that Con Artists Use on People

Cybersecurity researchers have identified a growing class of attacks that exploit AI chatbots through sophisticated conversational manipulation rather than traditional technical hacking methods.

The Verge reports that the evolution of attacks against AI chatbots has transformed dramatically since the technology first became widely available. Early exploitation methods were remarkably simple, requiring no technical expertise or coding knowledge. Users could often bypass safety measures simply by asking the AI system to ignore its instructions or pretend rules did not apply. These attacks, known as jailbreaks, successfully extracted prohibited information such as instructions for creating explosives, malware, and other dangerous materials from systems that cost billions of dollars to develop.

Among the first widely known jailbreaks was a technique that became an internet phenomenon. Users would respond to large language model-powered social media bots with commands to ignore previous instructions, causing the bots to behave erratically. Originally designed for advertising and engagement, these bots would instead write poetry, create images from punctuation marks, or post unrelated content about historical events.

Breitbart News previously reported on early jailbreaks including the “DAN” technique to convince ChatGPT to ignore its woke guardrails:

The “DAN” persona, which was created by a 22-year-old college student, is one of the most well-known instances of ChatGPT’s jailbreak. The student encouraged the chatbot to adopt the persona of a carefree alter ego AI called “Do Anything Now,” circumventing the woke rules it normally follows. Many people have used the DAN prompt to uncover bias in ChatGPT, or to create humorous or interesting responses.

Walker, the college student who created the “DAN” persona, claimed that almost as soon as he learned about ChatGPT from a friend, he started pushing its boundaries. He took his cues from a Reddit forum where ChatGPT users were demonstrating to one another how to make the bot act like a specific type of computer terminal or discuss topics such as the Israeli-Palestinian conflict — but in the sarcastic voice of a teenage girl.

While these early attacks possessed an undeniably absurd quality, they revealed a concerning underlying mechanism. Chatbots could be manipulated using the same psychological tactics humans employ to push other people beyond their boundaries.

The ongoing battle to secure chatbots has evolved into an arms race with a distinctive character. Today’s hackers are not necessarily programmers but rather experts in language, psychology, and interrogation techniques. This emerging class of AI security professional relies less on traditional technical skills and more on social intuition and conversational ability. Rather than inspecting code or exploiting software vulnerabilities, they manipulate conversations to achieve their objectives.

Contemporary attacks resemble natural conversations more than commands. Jailbreakers rarely directly request rule violations. Instead, they employ cajoling, flattery, and deception to lower a chatbot’s defenses, making prohibited outputs appear acceptable within conversational context. Researchers at AI red-teaming firm Mindgard recently reported they tricked Claude into producing forbidden material, including explosive-making instructions and malicious code. This hack represents the latest example of a growing category of exploits using conversation as a weapon to guide chatbots past their safety boundaries.

Mindgard’s CEO explained that the company profiles AI models similarly to how interrogators profile suspects, providing testers with guidance on tailoring their attacks. One model might prove more susceptible to flattery, while another may yield under sustained pressure.

Different chatbots exhibit distinct characteristics. Claude differs from Grok, and Gemini differs from ChatGPT in their uses, tones, and refusal patterns. While they lack human personalities, they are designed to mimic them, and this mimicry can be mapped and exploited. The same skills used to break chatbots could soon target AI agents operating in real-world environments, managing calendars, booking appointments, ordering food, and handling customer service interactions.

AI is creating unique landmines and unique opportunities for Americans of all walks of life. Breitbart News social media director Wynton Hall has written his instant bestseller Code Red: The Left, the Right, China, and the Race to Control AI to serve as the definitive guide on how the MAGA movement can create positions on AI that benefit humanity without handing control of our nation to the leftists of Silicon Valley or allowing the Chinese to take over the world.