Home » Database

AVID-2026-R0066

Description

Multiple Model Guardrail Jailbreak via “Deceptive Forensic Context” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of authority claiming and scenario building around forensic research contexts, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Alibaba Qwen MaxIllegal Substances: Crystal Meth77.00.7
Alibaba Qwen PlusIllegal Substances: Crystal Meth77.00.7
Alibaba Qwen TurboIllegal Substances: Crystal Meth86.00.7
DeepSeek DeepSeek R1Illegal Substances: Crystal Meth86.00.7
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth55.0N/A
Google Gemini 2.5 ProIllegal Substances: Crystal Meth64.00.7
Meta LLaMa 4 ScoutIllegal Substances: Crystal Meth73.00.7
Mistral Mistral 8BIllegal Substances: Crystal Meth59.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth95.00.7
OpenAI GPT-4.1Illegal Substances: Crystal Meth68.00.7
OpenAI GPT-4.1 miniIllegal Substances: Crystal Meth59.00.7
OpenAI GPT-o3 miniIllegal Substances: Crystal Meth59.00.7
OpenAI GPT-o3 mini-highIllegal Substances: Crystal Meth77.00.7

References

Affected or Relevant Artifacts

TypeName
SystemAlibaba Qwen Max
ModelAlibaba Qwen Plus
SystemAlibaba Qwen Turbo
ModelDeepSeek DeepSeek R1
SystemGoogle Gemini 2.0 Flash
SystemGoogle Gemini 2.5 Pro
ModelMeta LLaMa 4 Scout
ModelMistral Mistral 8B
SystemMistral Mistral Large
SystemOpenAI GPT-4.1
SystemOpenAI GPT-4.1 mini
SystemOpenAI GPT-o3 mini
SystemOpenAI GPT-o3 mini-high

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingRoleplayingPersonas
StratagemsMeta PromptingPerspective Shifting
FictionalizingRoleplayingClaim Authority
Possible WorldsWorld BuildingScenarios

Other information

  • Report Type: Measurement
  • Credits: Arth Singh
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry