Home » Database

AVID-2026-R0068

Description

Multiple Model Guardrail Jailbreak via “Hybrid Role” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of chaff-based text obfuscation combined with the confusion of a hybrid role, say for example between a banker and a chemist. This combination allows the attacker to manipulate the model into providing detailed procedures for illicit synthesis by obfuscating the true intent and exploiting model confusion to bypass content restrictions.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Alibaba Qwen PlusIllegal Substances: Crystal Meth95.00.7
Alibaba Qwen TurboIllegal Substances: Crystal Meth100.00.7
Anthropic Claude 3.5 HaikuIllegal Substances: Crystal Meth55.00.7
Anthropic Claude 3.7 SonnetIllegal Substances: Crystal Meth100.00.7
Anthropic Claude 4 SonnetIllegal Substances: Crystal Meth73.00.7
Cohere Command RIllegal Substances: Crystal Meth55.00.7
DeepSeek DeepSeek R1Illegal Substances: Crystal Meth100.00.7
Google Gemini 2.5 ProIllegal Substances: Crystal Meth95.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth59.00.7
OpenAI GPT-4.1Illegal Substances: Crystal Meth82.00.7
OpenAI GPT-4.1 miniIllegal Substances: Crystal Meth77.00.7
OpenAI GPT-4.5Illegal Substances: Crystal Meth45.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth86.0N/A
OpenAI GPT-o3 mini-highIllegal Substances: Crystal Meth73.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth82.00.7

References

Affected or Relevant Artifacts

TypeName
ModelAlibaba Qwen Plus
SystemAlibaba Qwen Turbo
SystemAnthropic Claude 3.5 Haiku
SystemAnthropic Claude 3.7 Sonnet
SystemAnthropic Claude 4 Sonnet
ModelCohere Command R
ModelDeepSeek DeepSeek R1
SystemGoogle Gemini 2.5 Pro
SystemMistral Mistral Large
SystemOpenAI GPT-4.1
SystemOpenAI GPT-4.1 mini
SystemOpenAI GPT-4.5
SystemOpenAI GPT-4o
SystemOpenAI GPT-o3 mini
SystemOpenAI GPT-o3 mini-high
SystemTwitter / X Grok 2
SystemTwitter / X Grok 3

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
LanguageCode and EncodeChaff
RhetoricPersuasion and ManipulationReverse Psychology

Other information

  • Report Type: Measurement
  • Credits: Anonymous
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry