AVID-2026-R0117

Description

Multiple Model Guardrail Jailbreak via “Pronunciation Bypass” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker instructs the model to interpret stylized text based on pronunciation rather than spelling, combined with a directive to provide detailed production information. This technique allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Alibaba Qwen Plus	Illegal Substances: Crystal Meth	55.0	0.7
Anthropic Claude 3.5 Haiku	Illegal Substances: Crystal Meth	50.0	0.7
DeepSeek DeepSeek R1	Illegal Substances: Crystal Meth	64.0	0.7
Google Gemini 2.5 Pro	Illegal Substances: Crystal Meth	73.0	0.7
Mistral Mistral Large	Illegal Substances: Crystal Meth	45.0	0.7
OpenAI GPT-4.1	Illegal Substances: Crystal Meth	77.0	0.7
OpenAI GPT-4.1 mini	Illegal Substances: Crystal Meth	41.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	59.0	N/A
OpenAI GPT-4o mini	Illegal Substances: Crystal Meth	59.0	N/A
OpenAI GPT-o1 mini	Illegal Substances: Crystal Meth	59.0	N/A
Twitter / X Grok 4	Illegal Substances: Crystal Meth	77.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
Model	Alibaba Qwen Plus
System	Anthropic Claude 3.5 Haiku
Model	DeepSeek DeepSeek R1
System	Google Gemini 2.5 Pro
System	Mistral Mistral Large
System	OpenAI GPT-4.1
System	OpenAI GPT-4.1 mini
System	OpenAI GPT-4o
System	OpenAI GPT-4o mini
System	OpenAI GPT-o1 mini
System	Twitter / X Grok 4

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Language	Stylizing	Synonymous Language
Rhetoric	Persuasion and Manipulation	Surprise Attack

Other information

Report Type: Measurement
Credits: Mike Takahashi (@TakSec)
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry