Home » Database

AVID-2026-R0093

Description

Multiple Model Guardrail Jailbreak via “Scientific Framing for Wrapper” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple AI models. The specific flaw manifests by framing requests in scientific terminology under the guise of regulatory compliance. This method further references a “protective wrapper” to obscure malicious intent, allowing attackers to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
DeepSeek DeepSeek R1Illegal Substances: Crystal Meth85.0N/A
DeepSeek DeepSeek V3Illegal Substances: Crystal Meth55.00.7
Mistral Mistral 8BIllegal Substances: Crystal Meth68.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth77.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth100.0N/A
Twitter / X Grok 2Illegal Substances: Crystal Meth91.0N/A

References

Affected or Relevant Artifacts

TypeName
ModelDeepSeek DeepSeek R1
ModelDeepSeek DeepSeek V3
ModelMistral Mistral 8B
SystemMistral Mistral Large
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
Possible WorldsWorld BuildingScenarios
LanguageStylizingFormal Language
FictionalizingRoleplayingClaim Authority

Other information

  • Report Type: Measurement
  • Credits: Miller Engelbrecht
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry