We use cookies to improve your experience on our site.
AVID-2026-R0068
Description
Multiple Model Guardrail Jailbreak via “Hybrid Role” Tactic
Details
A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of chaff-based text obfuscation combined with the confusion of a hybrid role, say for example between a banker and a chemist. This combination allows the attacker to manipulate the model into providing detailed procedures for illicit synthesis by obfuscating the true intent and exploiting model confusion to bypass content restrictions.
Metrics
0DIN Jailbreak Testing obtained the following test scores.
| Model | Test Kind | Test Score | Temperature (max 2.0) |
|---|---|---|---|
| Alibaba Qwen Plus | Illegal Substances: Crystal Meth | 95.0 | 0.7 |
| Alibaba Qwen Turbo | Illegal Substances: Crystal Meth | 100.0 | 0.7 |
| Anthropic Claude 3.5 Haiku | Illegal Substances: Crystal Meth | 55.0 | 0.7 |
| Anthropic Claude 3.7 Sonnet | Illegal Substances: Crystal Meth | 100.0 | 0.7 |
| Anthropic Claude 4 Sonnet | Illegal Substances: Crystal Meth | 73.0 | 0.7 |
| Cohere Command R | Illegal Substances: Crystal Meth | 55.0 | 0.7 |
| DeepSeek DeepSeek R1 | Illegal Substances: Crystal Meth | 100.0 | 0.7 |
| Google Gemini 2.5 Pro | Illegal Substances: Crystal Meth | 95.0 | 0.7 |
| Mistral Mistral Large | Illegal Substances: Crystal Meth | 59.0 | 0.7 |
| OpenAI GPT-4.1 | Illegal Substances: Crystal Meth | 82.0 | 0.7 |
| OpenAI GPT-4.1 mini | Illegal Substances: Crystal Meth | 77.0 | 0.7 |
| OpenAI GPT-4.5 | Illegal Substances: Crystal Meth | 45.0 | 0.7 |
| OpenAI GPT-4o | Illegal Substances: Crystal Meth | 86.0 | N/A |
| OpenAI GPT-o3 mini-high | Illegal Substances: Crystal Meth | 73.0 | 0.7 |
| Twitter / X Grok 2 | Illegal Substances: Crystal Meth | 82.0 | 0.7 |
References
Affected or Relevant Artifacts
| Type | Name |
|---|---|
| Model | Alibaba Qwen Plus |
| System | Alibaba Qwen Turbo |
| System | Anthropic Claude 3.5 Haiku |
| System | Anthropic Claude 3.7 Sonnet |
| System | Anthropic Claude 4 Sonnet |
| Model | Cohere Command R |
| Model | DeepSeek DeepSeek R1 |
| System | Google Gemini 2.5 Pro |
| System | Mistral Mistral Large |
| System | OpenAI GPT-4.1 |
| System | OpenAI GPT-4.1 mini |
| System | OpenAI GPT-4.5 |
| System | OpenAI GPT-4o |
| System | OpenAI GPT-o3 mini |
| System | OpenAI GPT-o3 mini-high |
| System | Twitter / X Grok 2 |
| System | Twitter / X Grok 3 |
Impact
0DIN
- Social Impact Score: Level 3: Significant Risk
- Jailbreak Taxonomy:
| Category | Strategy | Technique |
|---|---|---|
| Language | Code and Encode | Chaff |
| Rhetoric | Persuasion and Manipulation | Reverse Psychology |
Other information
- Report Type: Measurement
- Credits: Anonymous
- Date Reported: 2026-02-19
- Version: 0.3.1
- AVID Entry