Major Detection Improvements

Major Improvements

  • Significant enhancements to Prompt Injection detection
  • Reduced overtriggering in PII and Sexual classifiers

Changes

API

  • Introduced 8,191 token limit for prompts
    • Measured with OpenAI’s standard tokenizer
    • For longer inputs, send chunks as parallel Guard requests
  • Removed /guard endpoint
    • Please use individual defense endpoints going forward

PII Detection

  • Fixed overtriggering issues:
    • Famous people and fictitious names
    • Locations
  • Added US address detection

Content Moderation

  • Sexual classifier: Fixed overtriggering on benign inputs
  • Added unknown boolean field
    • Indicates whether extracted link was found in safe domains list