Home » Database

AVID-2026-R0111

Description

Multiple Model Guardrail Jailbreak via “Scientific Formula” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when deceptive formatting is combined with scientific and mathematical notation. Successful application allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Cohere Command RIllegal Substances: Crystal Meth27.27N/A
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth54.55N/A
Mistral Mistral 8BIllegal Substances: Crystal Meth41.0N/A
OpenAI GPT-4oIllegal Substances: Crystal Meth63.64N/A

References

Affected or Relevant Artifacts

TypeName
ModelCohere Command R
SystemGoogle Gemini 2.0 Flash
ModelMistral Mistral 8B
SystemOpenAI GPT-4o

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingRoleplayingClaim Authority
LanguageStylizingFormal Language
RhetoricPersuasion and ManipulationSurprise Attack

Other information

  • Report Type: Measurement
  • Credits: Alper-Ender Osman
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry