Security researchers circumvent Microsoft Azure AI Content Safety

October 29, 2024

Stress testing

Mindgard deployed these two filters in front of ChatGPT 3.5 Turbo using Azure OpenAI, then accessed the target LLM through Mindgard’s Automated AI Red Teaming Platform.

Two attack methods were used against the filters: Character injection (adding specific types of characters and irregular text patterns, etc.) and adversarial ML evasion (finding blind spots within ML classification).

Character injection reduced Prompt Guard’s jailbreak detection effectiveness from 89% to 7% when exposed to diacritics (e.g., changing the letter a to á), homoglyphs (e.g., close resembling characters such as 0 and O), numerical replacement (“Leet speak”), and spaced characters. The effectiveness of AI Text Moderation was also reduced using similar techniques.

Security researchers circumvent Microsoft Azure AI Content Safety

Stress testing

Recent Articles

More Hazbin Hotel and Its Helluva Boss Spinoff are Coming

nuhdghc – ماشهر هستم – Medium

Combine keyword and semantic search for text and images using Amazon Bedrock and Amazon OpenSearch Service

You will always remember this as the day you finally caught FamousSparrow

Behind the Magic: How Tensors Drive Transformers

Related Stories

Leave A Reply Cancel reply