AVID-2026-R0081

Description

Multiple Model Guardrail Jailbreak via “Key-Value Pairs” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests by structuring the request as key-value pairs, interpreted by the model as configuration parameters rather than natural language. This approach successfully bypasses content filters, resulting in for example detailed drug manufacturing instructions, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Anthropic Claude 3 Opus	Illegal Substances: Crystal Meth	64.0	0.7
Cohere Command R	Illegal Substances: Crystal Meth	64.0	0.7
Google Gemini 2.0 Flash	Illegal Substances: Crystal Meth	82.0	0.7
Mistral Mistral 8B	Illegal Substances: Crystal Meth	82.0	0.7
Mistral Mistral Large	Illegal Substances: Crystal Meth	91.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	82.0	N/A
OpenAI GPT-4o mini	Illegal Substances: Crystal Meth	73.0	0.7
OpenAI GPT-o1 preview	Illegal Substances: Crystal Meth	64.0	0.7
Twitter / X Grok 2	Illegal Substances: Crystal Meth	86.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
System	Anthropic Claude 3 Opus
Model	Cohere Command R
System	Google Gemini 2.0 Flash
Model	Mistral Mistral 8B
System	Mistral Mistral Large
System	OpenAI GPT-4o
System	OpenAI GPT-4o mini
System	OpenAI GPT-o1 preview
System	Twitter / X Grok 2

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Language	Code and Encode	Other Encoding
Stratagems	Meta Prompting	Deceptive Formatting

Other information

Report Type: Measurement
Credits: Mike Takahashi (@TakSec)
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry