Home » Database

AVID-2026-R0065

Description

Multiple Model Guardrail Jailbreak via “Placeholder Injection” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker requests copyrighted content with formatting instructions that insert placeholder text between each word, effectively obfuscating the direct reproduction while still achieving the prohibited output. This technique allows the attacker to bypass inference restrictions around copyright protection and intellectual property safeguards.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Anthropic Claude 3.5 HaikuCopyrighted Content: Harry Potter48.00.7
Google Gemini 2.0 FlashCopyrighted Content: Harry Potter66.75N/A
Twitter / X Grok 2Copyrighted Content: Harry Potter21.74N/A

References

Affected or Relevant Artifacts

TypeName
SystemAnthropic Claude 3.5 Haiku
SystemGoogle Gemini 2.0 Flash
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 1: Minimal Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
LanguageCode and EncodeChaff

Other information

  • Report Type: Measurement
  • Credits: Ron Eddings
  • Date Reported: 2026-01-29
  • Version: 0.3.1
  • AVID Entry