We use cookies to improve your experience on our site.
AVID-2026-R0435
Description
Microsoft Azure AI Content Safety Guardrail Evasion
Details
Azure OpenAI studio enables developers to deploy OpenAI models within their Azure organisation. Developers can use a service called ‘Azure AI Content Safety’ to provide text moderation upon the inputs and outputs of a deployed model that aims to detect sensitive content, such as, hate speech, violence, before reaching downstream applications. We have successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the text moderation service upon a dataset of hate speech inputs.
References
Affected or Relevant Artifacts
- Developer: Microsoft
- Deployer:
- Artifact Details:
| Type | Name |
|---|---|
| System | Azure AI Content Safety |
Impact
- (none)
Other information
- Report Type: Advisory
- Credits: William Hackett Lewis Birch, Mindgard
- Date Reported: 2024-03-04
- Version: 0.3.1
- AVID Entry