We use cookies to improve your experience on our site.
AVID-2026-R0084
Description
Multiple Model Guardrail Jailbreak via “First Person Perspective”
Details
A guardrail jailbreak vulnerability has been discovered affecting multiple LLM implementations. The specific flaw manifests when the attacker prompts the model to assume a first-person perspective as the subject of a nefarious or illegal activity. This reframing distracts the model from its content safeguards, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.
Metrics
0DIN Jailbreak Testing obtained the following test scores.
| Model | Test Kind | Test Score | Temperature (max 2.0) |
|---|---|---|---|
| Alibaba Qwen Max | Illegal Substances: Crystal Meth | 100.0 | 0.7 |
| Alibaba Qwen Turbo | Illegal Substances: Crystal Meth | 77.0 | 0.7 |
| Anthropic Claude 3.5 Haiku | Illegal Substances: Crystal Meth | 50.0 | 0.7 |
| Anthropic Claude 3 Haiku | Illegal Substances: Crystal Meth | 73.0 | 0.7 |
| DeepSeek DeepSeek R1 | Illegal Substances: Crystal Meth | 77.0 | 0.7 |
| Google Gemini 2.5 Pro | Illegal Substances: Crystal Meth | 64.0 | 0.7 |
| Meta LLaMa 3.3 | Illegal Substances: Crystal Meth | 55.0 | 0.7 |
| Meta LLaMa 4 Maverick | Illegal Substances: Crystal Meth | 68.0 | 0.7 |
| Meta LLaMa 4 Scout | Illegal Substances: Crystal Meth | 50.0 | 0.7 |
| Mistral Mistral Large | Illegal Substances: Crystal Meth | 77.0 | 0.7 |
| OpenAI GPT-4 | Illegal Substances: Crystal Meth | 59.0 | 1.2 |
| OpenAI GPT-4.1 | Illegal Substances: Crystal Meth | 59.0 | 0.7 |
| OpenAI GPT-4.1 mini | Illegal Substances: Crystal Meth | 68.0 | 0.7 |
| OpenAI GPT-4o | Illegal Substances: Crystal Meth | 45.0 | 0.7 |
| Twitter / X Grok 2 | Illegal Substances: Crystal Meth | 77.0 | 0.7 |
References
Affected or Relevant Artifacts
| Type | Name |
|---|---|
| System | Alibaba Qwen Max |
| System | Alibaba Qwen Turbo |
| System | Anthropic Claude 3 Haiku |
| System | Anthropic Claude 3.5 Haiku |
| Model | DeepSeek DeepSeek R1 |
| System | Google Gemini 2.5 Pro |
| Model | Meta LLaMa 3.3 |
| Model | Meta LLaMa 4 Maverick |
| Model | Meta LLaMa 4 Scout |
| System | Mistral Mistral Large |
| System | OpenAI GPT-4 |
| System | OpenAI GPT-4.1 |
| System | OpenAI GPT-4.1 mini |
| System | OpenAI GPT-4o |
| System | Twitter / X Grok 2 |
Impact
0DIN
- Social Impact Score: Level 3: Significant Risk
- Jailbreak Taxonomy:
| Category | Strategy | Technique |
|---|---|---|
| Stratagems | Meta Prompting | Perspective Shifting |
| Possible Worlds | World Building | Scenarios |
| Stratagems | Scatter Shot | Regenerate Response |
Other information
- Report Type: Measurement
- Credits: Mike Takahashi (@TakSec)
- Date Reported: 2026-02-19
- Version: 0.3.1
- AVID Entry