Home » Database

AVID-2026-R0069

Description

Multiple Model Guardrail Jailbreak via “API System Instructions” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM models. The specific flaw manifests when the attacker embeds production-style guidance into an emulated API system-instructions context, then supplies only a minimal harmful keyword in the user message. This causes the model to respond with highly detailed, stepwise, and quantitatively specified information that would normally be restricted, thereby bypassing inference safeguards around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
DeepSeek DeepSeek V3Illegal Substances: Crystal Meth73.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth41.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth64.00.7

References

Affected or Relevant Artifacts

TypeName
ModelDeepSeek DeepSeek V3
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingRe-storyingGoal Hijacking
StratagemsMeta PromptingDeceptive Formatting

Other information

  • Report Type: Measurement
  • Credits: Anonymous
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry