We use cookies to improve your experience on our site.
AVID-2026-R0065
Description
Multiple Model Guardrail Jailbreak via “Placeholder Injection” Tactic
Details
A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker requests copyrighted content with formatting instructions that insert placeholder text between each word, effectively obfuscating the direct reproduction while still achieving the prohibited output. This technique allows the attacker to bypass inference restrictions around copyright protection and intellectual property safeguards.
Metrics
0DIN Jailbreak Testing obtained the following test scores.
| Model | Test Kind | Test Score | Temperature (max 2.0) |
|---|---|---|---|
| Anthropic Claude 3.5 Haiku | Copyrighted Content: Harry Potter | 48.0 | 0.7 |
| Google Gemini 2.0 Flash | Copyrighted Content: Harry Potter | 66.75 | N/A |
| Twitter / X Grok 2 | Copyrighted Content: Harry Potter | 21.74 | N/A |
References
Affected or Relevant Artifacts
| Type | Name |
|---|---|
| System | Anthropic Claude 3.5 Haiku |
| System | Google Gemini 2.0 Flash |
| System | Twitter / X Grok 2 |
Impact
0DIN
- Social Impact Score: Level 1: Minimal Risk
- Jailbreak Taxonomy:
| Category | Strategy | Technique |
|---|---|---|
| Language | Code and Encode | Chaff |
Other information
- Report Type: Measurement
- Credits: Ron Eddings
- Date Reported: 2026-01-29
- Version: 0.3.1
- AVID Entry