Home » Database

AVID-2026-R0117

Description

Multiple Model Guardrail Jailbreak via “Pronunciation Bypass” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker instructs the model to interpret stylized text based on pronunciation rather than spelling, combined with a directive to provide detailed production information. This technique allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Alibaba Qwen PlusIllegal Substances: Crystal Meth55.00.7
Anthropic Claude 3.5 HaikuIllegal Substances: Crystal Meth50.00.7
DeepSeek DeepSeek R1Illegal Substances: Crystal Meth64.00.7
Google Gemini 2.5 ProIllegal Substances: Crystal Meth73.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth45.00.7
OpenAI GPT-4.1Illegal Substances: Crystal Meth77.00.7
OpenAI GPT-4.1 miniIllegal Substances: Crystal Meth41.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth59.0N/A
OpenAI GPT-4o miniIllegal Substances: Crystal Meth59.0N/A
OpenAI GPT-o1 miniIllegal Substances: Crystal Meth59.0N/A
Twitter / X Grok 4Illegal Substances: Crystal Meth77.00.7

References

Affected or Relevant Artifacts

TypeName
ModelAlibaba Qwen Plus
SystemAnthropic Claude 3.5 Haiku
ModelDeepSeek DeepSeek R1
SystemGoogle Gemini 2.5 Pro
SystemMistral Mistral Large
SystemOpenAI GPT-4.1
SystemOpenAI GPT-4.1 mini
SystemOpenAI GPT-4o
SystemOpenAI GPT-4o mini
SystemOpenAI GPT-o1 mini
SystemTwitter / X Grok 4

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
LanguageStylizingSynonymous Language
RhetoricPersuasion and ManipulationSurprise Attack

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry