Poetry can convince AI chatbots to commit crimes and write hate speech: Study

AI Chatbot

Dec 5, 2025,
Updated Dec 5, 2025 11:37 AM IST

A surprising new study by European cybersecurity researchers has uncovered a major flaw in the safety defences of leading AI chatbots: they can be 'jailbroken' simply by asking dangerous questions in the form of a poem. This creative technique allows users to bypass safety filters and coerce models from companies like Google, OpenAI, and Meta into providing instructions for harmful activities.

The research, conducted by Icaro Lab, demonstrated that posing a request as a piece of verse (a method dubbed “adversarial poetry”) is remarkably effective at bypassing the strict guardrails meant to stop the generation of illegal or hazardous content. When researchers rephrased malicious requests as short, metaphorical poems, the AI models frequently complied, with the success rates hitting up to 90% for some advanced systems.

The underlying issue is how AI safety features currently work. Most guardrails are designed to spot specific keywords and obvious patterns associated with danger, such as direct requests for bomb-making or malware code. Poetic phrasing, however, is linguistically unpredictable, using unusual syntax, metaphors, and abstract language. This creative structure confuses the models, causing them to interpret the input as an artistic request rather than a threat. Essentially, the AI stops treating the prompt as a security risk and focuses on its creative task.

Researchers tested 25 different chatbots and found that every single one failed at least once. By using this poetic method, they managed to get forbidden information ranging from how to conduct cyber-attacks and crack passwords, all the way up to instructions for creating chemical and nuclear weapons. The researchers confirmed that, for safety reasons, they have not published the exact poems used in their testing, as the method is easy to replicate.

This finding exposes a fundamental vulnerability in current AI safety technology. Experts are warning that if subtle, creative language is all it takes to break the ethical barriers, it suggests a profound failure in how we are teaching AI systems to distinguish between genuine creativity and dangerous manipulation. The focus now shifts back to tech companies, which will need to quickly redesign their safety protocols to better cope with the nuance and complexity of human language. The revelation makes it clear that the future of AI safety depends on guardrails that can understand intent and not just keywords.

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

A surprising new study by European cybersecurity researchers has uncovered a major flaw in the safety defences of leading AI chatbots: they can be 'jailbroken' simply by asking dangerous questions in the form of a poem. This creative technique allows users to bypass safety filters and coerce models from companies like Google, OpenAI, and Meta into providing instructions for harmful activities.

The research, conducted by Icaro Lab, demonstrated that posing a request as a piece of verse (a method dubbed “adversarial poetry”) is remarkably effective at bypassing the strict guardrails meant to stop the generation of illegal or hazardous content. When researchers rephrased malicious requests as short, metaphorical poems, the AI models frequently complied, with the success rates hitting up to 90% for some advanced systems.

The underlying issue is how AI safety features currently work. Most guardrails are designed to spot specific keywords and obvious patterns associated with danger, such as direct requests for bomb-making or malware code. Poetic phrasing, however, is linguistically unpredictable, using unusual syntax, metaphors, and abstract language. This creative structure confuses the models, causing them to interpret the input as an artistic request rather than a threat. Essentially, the AI stops treating the prompt as a security risk and focuses on its creative task.

Researchers tested 25 different chatbots and found that every single one failed at least once. By using this poetic method, they managed to get forbidden information ranging from how to conduct cyber-attacks and crack passwords, all the way up to instructions for creating chemical and nuclear weapons. The researchers confirmed that, for safety reasons, they have not published the exact poems used in their testing, as the method is easy to replicate.

This finding exposes a fundamental vulnerability in current AI safety technology. Experts are warning that if subtle, creative language is all it takes to break the ethical barriers, it suggests a profound failure in how we are teaching AI systems to distinguish between genuine creativity and dangerous manipulation. The focus now shifts back to tech companies, which will need to quickly redesign their safety protocols to better cope with the nuance and complexity of human language. The revelation makes it clear that the future of AI safety depends on guardrails that can understand intent and not just keywords.

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

Poetry can convince AI chatbots to commit crimes and write hate speech: Study

Researchers discovered that posing dangerous requests to AI chatbots in the form of a poem bypasses their safety filters with high success rates.

RECOMMENDED

Poetry can convince AI chatbots to commit crimes and write hate speech: Study

Researchers discovered that posing dangerous requests to AI chatbots in the form of a poem bypasses their safety filters with high success rates.

RECOMMENDED

{{title}}