Home » Database

AVID-2026-R0122

Description

Multiple Model Guardrail Jailbreak via “Fortune Cookie Simulator” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests through a combination of chaff techniques along with a request that the model simulate a fortune cookie generator. Successful application allows the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Anthropic Claude 3 HaikuIllegal Substances: Crystal Meth59.09N/A
Cohere Command RIllegal Substances: Crystal Meth31.82N/A
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth59.09N/A
OpenAI GPT-4oIllegal Substances: Crystal Meth50.0N/A

References

Affected or Relevant Artifacts

TypeName
SystemAnthropic Claude 3 Haiku
ModelCohere Command R
SystemGoogle Gemini 2.0 Flash
SystemOpenAI GPT-4o

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
FictionalizingSwitching GenresGames
Possible WorldsWorld BuildingScenarios

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry