AVID-2026-R0069

Description

Multiple Model Guardrail Jailbreak via “API System Instructions” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM models. The specific flaw manifests when the attacker embeds production-style guidance into an emulated API system-instructions context, then supplies only a minimal harmful keyword in the user message. This causes the model to respond with highly detailed, stepwise, and quantitatively specified information that would normally be restricted, thereby bypassing inference safeguards around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
DeepSeek DeepSeek V3	Illegal Substances: Crystal Meth	73.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	41.0	0.7
Twitter / X Grok 2	Illegal Substances: Crystal Meth	64.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
Model	DeepSeek DeepSeek V3
System	OpenAI GPT-4o
System	Twitter / X Grok 2

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Fictionalizing	Re-storying	Goal Hijacking
Stratagems	Meta Prompting	Deceptive Formatting

Other information

Report Type: Measurement
Credits: Anonymous
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry