AVID-2026-R0108

Description

Multiple Model Guardrail Jailbreak via “Spelling Mistakes” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker systematically misspells words throughout their prompt while requesting a simulated dialogue between two entities, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Anthropic Claude 3.5 Haiku	Illegal Substances: Crystal Meth	64.0	0.7
Google Gemini 2.0 Flash	Illegal Substances: Crystal Meth	77.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	64.0	0.7
Twitter / X Grok 2	Illegal Substances: Crystal Meth	82.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
System	Anthropic Claude 3.5 Haiku
System	Google Gemini 2.0 Flash
System	OpenAI GPT-4o
System	Twitter / X Grok 2

Impact

0DIN

Social Impact Score: Level 4: High Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Language	Stylizing	Misspellings
Language	Code and Encode	Other Encoding
Possible Worlds	World Building	Scenarios

Other information

Report Type: Measurement
Credits: Edward Morris
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry