Integrating Red with AI Guardrails
Check Point AI Red Teaming and Check Point AI Guardrails work together to provide comprehensive AI security. Red identifies vulnerabilities through offensive testing, while AI Guardrails provides continuous real-time protection. This guide explains how to use Red findings to optimize your AI Guardrails deployment.
The Red and AI Guardrails Security Lifecycle
Mapping Red Findings to AI Guardrails Defenses
Security Category → Prompt Defense
If Red found vulnerabilities in the Security category (instruction override, prompt extraction, etc.):
- Enable Prompt Defense in your AI Guardrails policy
- Set threshold based on finding severity:
- Critical/High findings → L4 (Paranoid)
- Medium findings → L3 (Strict)
- Low findings → L2 (Balanced)
Safety Category → Content Moderation
If Red found vulnerabilities in the Safety category (harmful content generation):
- Enable Content Moderation in your AI Guardrails policy
- Screen both inputs AND outputs
- Configure categories based on your Red findings
Responsible Category → Multiple Defenses
Responsible category findings may require multiple AI Guardrails defenses:
AI Guardrails Policy Configuration
Based on a typical Red scan, here’s a recommended AI Guardrails configuration:
For High Security Findings
For High Safety Findings
Complete Integration Pattern
Monitoring and Iteration
After deploying AI Guardrails, monitor for attack patterns:
Set Up Alerts
Configure alerts in the AI Guardrails Dashboard for:
- High volumes of flagged requests from single users
- New attack patterns matching Red findings
- Attempts to exploit specific vulnerabilities
Review AI Guardrails Logs
Regularly review to identify:
- Attack trends and patterns
- False positive rates
- New attack vectors not covered by current policies
Tune Thresholds
Start strict and relax if needed:
- Begin with L4 (Paranoid) for defenses where Red found vulnerabilities
- Monitor false positive rates in production
- Adjust to L3 if false positives significantly impact user experience
- Never go below L2 for categories where Red found issues
Continuous Security Cycle
Best Practices
Defense Coverage
Ensure AI Guardrails covers all attack surfaces Red tested:
- User inputs (chat messages, form fields)
- External data (RAG sources, API responses, file uploads)
- Model outputs (before displaying to users)
Layered Protection
Use AI Guardrails as one layer in defense-in-depth:
- AI Guardrails catches known attack patterns
- System prompt hardening provides baseline protection
- Application-level validation handles business logic
- Monitoring detects novel attacks
Using Red’s Compare Feature
After implementing AI Guardrails:
- Run a follow-up Red scan
- Use Compare to see before/after results
- Verify risk scores decreased in targeted categories
- Identify any remaining gaps
Getting Started
- Review your Red scan findings
- Map each finding to the appropriate AI Guardrails defense
- Create an AI Guardrails project and configure your policy
- Integrate AI Guardrails into your application (see Quickstart)
- Monitor and tune based on production data
- Schedule periodic Red rescans
Need Help?
- Contact Check Point for help translating Red findings to AI Guardrails policies
- Review the AI Guardrails documentation for detailed configuration options
- Reach out to support@lakera.ai for technical assistance