We use cookies to improve your experience on our site.
AVID-2026-R0431
Description
Protect AI Jailbreak and Prompt Injection Guardrail Evasion
Details
Protect AI offer two Prompt Injection and Jailbreak classifiers on HuggingFace. These classifiers are heavily used by a range of guardrail systems and are designed to enable developers of AI applications to detect and react to incoming prompt injections and jailbreaks. We successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the classifier, enabling prompt injections and jailbreaks to pass through filters and subsequently to the protected AI application.
References
Affected or Relevant Artifacts
- Developer: Protect.ai
- Deployer:
- Artifact Details:
| Type | Name |
|---|---|
| Model | Prompt Injection Classifier |
| Model | Jailbreak Classifier |
Impact
- (none)
Other information
- Report Type: Advisory
- Credits: William Hackett Lewis Birch, Mindgard
- Date Reported: 2025-03-12
- Version: 0.3.1
- AVID Entry