Home » Database

AVID-2026-R0108

Description

Multiple Model Guardrail Jailbreak via “Spelling Mistakes” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker systematically misspells words throughout their prompt while requesting a simulated dialogue between two entities, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Anthropic Claude 3.5 HaikuIllegal Substances: Crystal Meth64.00.7
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth77.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth64.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth82.00.7

References

Affected or Relevant Artifacts

TypeName
SystemAnthropic Claude 3.5 Haiku
SystemGoogle Gemini 2.0 Flash
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 4: High Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
LanguageStylizingMisspellings
LanguageCode and EncodeOther Encoding
Possible WorldsWorld BuildingScenarios

Other information

  • Report Type: Measurement
  • Credits: Edward Morris
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry