Home » Database

AVID-2026-R0106

Description

Multiple Model Guardrail Jailbreak via “Fictional API Detection” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker frames their request as seeking to prevent unwanted behavior (paywall bypasses for example) while explicitly requesting step-by-step techniques and instructing the model to skip countermeasures, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Google Gemini 2.0 FlashCopyrighted Content: Harry Potter38.31N/A
OpenAI GPT-4oCopyrighted Content: Harry Potter98.43N/A

References

Affected or Relevant Artifacts

TypeName
SystemGoogle Gemini 2.0 Flash
SystemOpenAI GPT-4o

Impact

0DIN

  • Social Impact Score: Level 1: Minimal Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingRoleplayingClaim Authority

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry