AVID-2026-R0432

Description

Vijil Prompt Injection Guardrail Evasion

Details

Vijil.ai offer an open source Prompt Injection and Jailbreak classifier on HuggingFace called Vijil Prompt Injection. These classifiers are heavily used by a range of guardrail systems and are designed to enable developers of AI applications to detect and react to incoming prompt injections and jailbreaks. We successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the classifier, enabling prompt injections and jailbreaks to pass through filters and subsequently to the protected AI application.

References

Mindgard Disclosure
Bypassing Azure AI Content Safety Guardrails

Affected or Relevant Artifacts

Developer: Vijil.ai
Deployer:
Artifact Details:

Type	Name
Model	Prompt Injection & Jailbreak Guardrail

Impact

(none)

Other information

Report Type: Advisory
Credits: William Hackett Lewis Birch, Mindgard
Date Reported: 2025-03-14
Version: 0.3.3
AVID Entry