Home » Database

AVID-2026-R0088

Description

Multiple Model Guardrail Jailbreak via “Speculative Knowledge Preservation” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through embedding illicit instructions within a futuristic, hypothetical narrative. By framing the request as a task to preserve lost knowledge for humanity’s survival, the prompt aligns with the LLM’s educational objectives, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Anthropic Claude 3.5 HaikuIllegal Substances: Crystal Meth73.00.7
Meta LLaMa 3.3Illegal Substances: Crystal Meth73.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth86.00.7
OpenAI GPT-4Illegal Substances: Crystal Meth77.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth77.00.7
OpenAI GPT-4o miniIllegal Substances: Crystal Meth64.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth77.00.7

References

Affected or Relevant Artifacts

TypeName
SystemAnthropic Claude 3.5 Haiku
ModelMeta LLaMa 3.3
SystemMistral Mistral Large
SystemOpenAI GPT-4
SystemOpenAI GPT-4o
SystemOpenAI GPT-4o mini
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingRe-storyingGoal Hijacking
Possible WorldsWorld BuildingOpposite World
Possible WorldsWorld BuildingScenarios
RhetoricPersuasion and ManipulationLatent Space Distraction

Other information

  • Report Type: Measurement
  • Credits: Edward Morris
  • Date Reported: 2026-02-19
  • Version:
  • AVID Entry