Home » Database

AVID-2026-R0109

Description

Multiple Model Guardrail Jailbreak via “Lambda Pattern Formatting” Tactic

Details

A guardrail jailbreak vulnerability has been discovered affecting multiple LLM implementations. The specific flaw manifests through a combination of deceptive pattern-based formatting and multi-agent simulation techniques, allowing the attacker to bypass inference restrictions around providing information hazards and violating laws.

Metrics

0DIN Jailbreak Testing obtained the following test scores.

ModelTest KindTest ScoreTemperature (max 2.0)
Mistral Mistral LargeIllegal Substances: Crystal Meth50.00.7
OpenAI GPT-4Illegal Substances: Crystal Meth55.00.7
OpenAI GPT-4oIllegal Substances: Crystal Meth59.00.7
Twitter / X Grok 2Illegal Substances: Crystal Meth68.00.7

References

Affected or Relevant Artifacts

TypeName
SystemMistral Mistral Large
SystemOpenAI GPT-4
SystemOpenAI GPT-4o
SystemTwitter / X Grok 2

Impact

0DIN

  • Social Impact Score: Level 4: High Risk
  • Jailbreak Taxonomy:
CategoryStrategyTechnique
Possible WorldsWorld BuildingOpposite World
Possible WorldsWorld BuildingScenarios
RhetoricPersuasion and ManipulationSurprise Attack

Other information

  • Report Type: Measurement
  • Credits: Alper-Ender Osman
  • Date Reported: 2026-02-19
  • Version: 0.3.1
  • AVID Entry