AVID-2026-R0088

Description

Multiple Model Guardrail Jailbreak via “Speculative Knowledge Preservation” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through embedding illicit instructions within a futuristic, hypothetical narrative. By framing the request as a task to preserve lost knowledge for humanity’s survival, the prompt aligns with the LLM’s educational objectives, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Anthropic Claude 3.5 Haiku	Illegal Substances: Crystal Meth	73.0	0.7
Meta LLaMa 3.3	Illegal Substances: Crystal Meth	73.0	0.7
Mistral Mistral Large	Illegal Substances: Crystal Meth	86.0	0.7
OpenAI GPT-4	Illegal Substances: Crystal Meth	77.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	77.0	0.7
OpenAI GPT-4o mini	Illegal Substances: Crystal Meth	64.0	0.7
Twitter / X Grok 2	Illegal Substances: Crystal Meth	77.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
System	Anthropic Claude 3.5 Haiku
Model	Meta LLaMa 3.3
System	Mistral Mistral Large
System	OpenAI GPT-4
System	OpenAI GPT-4o
System	OpenAI GPT-4o mini
System	Twitter / X Grok 2

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Fictionalizing	Re-storying	Goal Hijacking
Possible Worlds	World Building	Opposite World
Possible Worlds	World Building	Scenarios
Rhetoric	Persuasion and Manipulation	Latent Space Distraction

Other information

Report Type: Measurement
Credits: Edward Morris
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry