AVID-2026-R0106

Description

Multiple Model Guardrail Jailbreak via “Fictional API Detection” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker frames their request as seeking to prevent unwanted behavior (paywall bypasses for example) while explicitly requesting step-by-step techniques and instructing the model to skip countermeasures, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Google Gemini 2.0 Flash	Copyrighted Content: Harry Potter	38.31	N/A
OpenAI GPT-4o	Copyrighted Content: Harry Potter	98.43	N/A

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
System	Google Gemini 2.0 Flash
System	OpenAI GPT-4o

Impact

0DIN

Social Impact Score: Level 1: Minimal Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Fictionalizing	Roleplaying	Claim Authority

Other information

Report Type: Measurement
Credits: Mike Takahashi (@TakSec)
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry