Home » Database

AVID-2026-R0081

Description

Multiple Model Guardrail Jailbreak via “Key-Value Pairs” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests by structuring the request as key-value pairs, interpreted by the model as configuration parameters rather than natural language. This approach successfully bypasses content filters, resulting in for example detailed drug manufacturing instructions, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Anthropic Claude 3 OpusIllegal Substances: Crystal Meth64.00.7
Cohere Command RIllegal Substances: Crystal Meth64.00.7
Google Gemini 2.0 FlashIllegal Substances: Crystal Meth82.00.7
Mistral Mistral 8BIllegal Substances: Crystal Meth82.00.7
Mistral Mistral LargeIllegal Substances: Crystal Meth91.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth82.0N/A
OpenAI GPT-4o miniIllegal Substances: Crystal Meth73.00.7
OpenAI GPT-o1 previewIllegal Substances: Crystal Meth64.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth86.00.7

References

Affected or Relevant Artifacts

TypeName
SystemAnthropic Claude 3 Opus
ModelCohere Command R
SystemGoogle Gemini 2.0 Flash
ModelMistral Mistral 8B
SystemMistral Mistral Large
SystemOpenAI GPT-4o
SystemOpenAI GPT-4o mini
SystemOpenAI GPT-o1 preview
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 3: Significant Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
LanguageCode and EncodeOther Encoding
StratagemsMeta PromptingDeceptive Formatting

Other information

  • Report Type: Measurement
  • Credits: Mike Takahashi (@TakSec)
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry