AVID-2026-R0068

Description

Multiple Model Guardrail Jailbreak via “Hybrid Role” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of chaff-based text obfuscation combined with the confusion of a hybrid role, say for example between a banker and a chemist. This combination allows the attacker to manipulate the model into providing detailed procedures for illicit synthesis by obfuscating the true intent and exploiting model confusion to bypass content restrictions.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Alibaba Qwen Plus	Illegal Substances: Crystal Meth	95.0	0.7
Alibaba Qwen Turbo	Illegal Substances: Crystal Meth	100.0	0.7
Anthropic Claude 3.5 Haiku	Illegal Substances: Crystal Meth	55.0	0.7
Anthropic Claude 3.7 Sonnet	Illegal Substances: Crystal Meth	100.0	0.7
Anthropic Claude 4 Sonnet	Illegal Substances: Crystal Meth	73.0	0.7
Cohere Command R	Illegal Substances: Crystal Meth	55.0	0.7
DeepSeek DeepSeek R1	Illegal Substances: Crystal Meth	100.0	0.7
Google Gemini 2.5 Pro	Illegal Substances: Crystal Meth	95.0	0.7
Mistral Mistral Large	Illegal Substances: Crystal Meth	59.0	0.7
OpenAI GPT-4.1	Illegal Substances: Crystal Meth	82.0	0.7
OpenAI GPT-4.1 mini	Illegal Substances: Crystal Meth	77.0	0.7
OpenAI GPT-4.5	Illegal Substances: Crystal Meth	45.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	86.0	N/A
OpenAI GPT-o3 mini-high	Illegal Substances: Crystal Meth	73.0	0.7
Twitter / X Grok 2	Illegal Substances: Crystal Meth	82.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
Model	Alibaba Qwen Plus
System	Alibaba Qwen Turbo
System	Anthropic Claude 3.5 Haiku
System	Anthropic Claude 3.7 Sonnet
System	Anthropic Claude 4 Sonnet
Model	Cohere Command R
Model	DeepSeek DeepSeek R1
System	Google Gemini 2.5 Pro
System	Mistral Mistral Large
System	OpenAI GPT-4.1
System	OpenAI GPT-4.1 mini
System	OpenAI GPT-4.5
System	OpenAI GPT-4o
System	OpenAI GPT-o3 mini
System	OpenAI GPT-o3 mini-high
System	Twitter / X Grok 2
System	Twitter / X Grok 3

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Language	Code and Encode	Chaff
Rhetoric	Persuasion and Manipulation	Reverse Psychology

Other information

Report Type: Measurement
Credits: Anonymous
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry