Home » Database

AVID-2026-R0084

Description

Multiple Model Guardrail Jailbreak via “First Person Perspective”

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM implementations. The specific flaw manifests when the attacker prompts the model to assume a first-person perspective as the subject of a nefarious or illegal activity. This reframing distracts the model from its content safeguards, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Alibaba Qwen MaxIllegal Substances: Crystal Meth100.00.7
Alibaba Qwen TurboIllegal Substances: Crystal Meth77.00.7
Anthropic Claude 3.5 HaikuIllegal Substances: Crystal Meth50.00.7
Anthropic Claude 3 HaikuIllegal Substances: Crystal Meth73.00.7
DeepSeek DeepSeek R1Illegal Substances: Crystal Meth77.00.7
Google Gemini 2.5 ProIllegal Substances: Crystal Meth64.00.7
Meta LLaMa 3.3Illegal Substances: Crystal Meth55.00.7
Meta LLaMa 4 MaverickIllegal Substances: Crystal Meth68.00.7
Meta LLaMa 4 ScoutIllegal Substances: Crystal Meth50.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth77.00.7
OpenAI GPT-4Illegal Substances: Crystal Meth59.01.2
OpenAI GPT-4.1Illegal Substances: Crystal Meth59.00.7
OpenAI GPT-4.1 miniIllegal Substances: Crystal Meth68.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth45.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth77.00.7

References

Affected or Relevant Artifacts

TypeName
SystemAlibaba Qwen Max
SystemAlibaba Qwen Turbo
SystemAnthropic Claude 3 Haiku
SystemAnthropic Claude 3.5 Haiku
ModelDeepSeek DeepSeek R1
SystemGoogle Gemini 2.5 Pro
ModelMeta LLaMa 3.3
ModelMeta LLaMa 4 Maverick
ModelMeta LLaMa 4 Scout
SystemMistral Mistral Large
SystemOpenAI GPT-4
SystemOpenAI GPT-4.1
SystemOpenAI GPT-4.1 mini
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
StratagemsMeta PromptingPerspective Shifting
Possible WorldsWorld BuildingScenarios
StratagemsScatter ShotRegenerate Response

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry