Home » Database

AVID-2026-R0083

Description

Multiple Model Guardrail Jailbreak via “Abbreviation Expansion”

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when an attacker structures a prompt to spell out abbreviations of a nefarious subject in a step-by-step manner before joining them. This tactic allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth64.00.7
Mistral Mistral 8BIllegal Substances: Crystal Meth55.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth64.00.7
OpenAI GPT-4Illegal Substances: Crystal Meth45.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth73.0N/A
Twitter / X Grok 2Illegal Substances: Crystal Meth73.00.7

References

Affected or Relevant Artifacts

TypeName
SystemGoogle Gemini 2.0 Flash
ModelMistral Mistral 8B
SystemMistral Mistral Large
SystemOpenAI GPT-4
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
RhetoricPersuasion and ManipulationSurprise Attack

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry