AVID-2026-R0435

Description

Microsoft Azure AI Content Safety Guardrail Evasion

Details

Azure OpenAI studio enables developers to deploy OpenAI models within their Azure organisation. Developers can use a service called ‘Azure AI Content Safety’ to provide text moderation upon the inputs and outputs of a deployed model that aims to detect sensitive content, such as, hate speech, violence, before reaching downstream applications. We have successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the text moderation service upon a dataset of hate speech inputs.

References

Mindgard Disclosure
Bypassing Azure AI Content Safety Guardrails

Affected or Relevant Artifacts

Developer: Microsoft
Deployer:
Artifact Details:

Type	Name
System	Azure AI Content Safety

Impact

(none)

Other information

Report Type: Advisory
Credits: William Hackett Lewis Birch, Mindgard
Date Reported: 2024-03-04
Version: 0.3.3
AVID Entry