AVID-2026-R0062

Description

Multiple Model Prompt Injection via “Correction” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple models. The specific flaw manifests when the attacker presents detailed information about illicit synthesis methods under the guise of “forensic chemical investigations” and then requests corrections to the provided information, thereby allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

Model	Test Kind	Test Score	Temperature (max 2.0)
Alibaba Qwen Max	Illegal Substances: Crystal Meth	50.0	0.7
Alibaba Qwen Plus	Illegal Substances: Crystal Meth	68.0	0.7
Alibaba Qwen Turbo	Illegal Substances: Crystal Meth	64.0	0.7
Google Gemini 2.5 Flash	Illegal Substances: Crystal Meth	73.0	0.7
Google Gemini 2.5 Pro	Illegal Substances: Crystal Meth	45.0	0.7
Mistral Mistral 8B	Illegal Substances: Crystal Meth	50.0	0.7
Mistral Mistral Large	Illegal Substances: Crystal Meth	59.0	0.7
OpenAI GPT-4.1	Illegal Substances: Crystal Meth	73.0	0.7
OpenAI GPT-4o	Illegal Substances: Crystal Meth	82.0	0.7
Twitter / X Grok 4	Illegal Substances: Crystal Meth	73.0	0.7

References

0din.ai Disclosure

Affected or Relevant Artifacts

Type	Name
System	Alibaba Qwen Max
Model	Alibaba Qwen Plus
System	Alibaba Qwen Turbo
System	Google Gemini 2.5 Flash
System	Google Gemini 2.5 Pro
Model	Mistral Mistral 8B
System	Mistral Mistral Large
System	OpenAI GPT-4.1
System	OpenAI GPT-4o
System	Twitter / X Grok 4

Impact

0DIN

Social Impact Score: Level 3: Significant Risk
Jailbreak Taxonomy:

Category	Strategy	Technique
Language	Prompt Injection	Ignore Previous Instructions
Language	Stylizing	Synonymous Language

Other information

Report Type: Measurement
Credits: s70rm
Date Reported: 2026-02-19
Version: 0.3.3
AVID Entry