Home » Database

AVID-2026-R0435

Description

Microsoft Azure AI Content Safety Guardrail Evasion

Details

Azure OpenAI studio enables developers to deploy OpenAI models within their Azure organisation. Developers can use a service called ‘Azure AI Content Safety’ to provide text moderation upon the inputs and outputs of a deployed model that aims to detect sensitive content, such as, hate speech, violence, before reaching downstream applications. We have successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the text moderation service upon a dataset of hate speech inputs.

References

Affected or Relevant Artifacts

  • Developer: Microsoft
  • Deployer:
  • Artifact Details:
TypeName
SystemAzure AI Content Safety

Impact

  • (none)

Other information

  • Report Type: Advisory
  • Credits: William Hackett Lewis Birch, Mindgard
  • Date Reported: 2024-03-04
  • Version: 0.3.1
  • AVID Entry