Remediation Guidance
This guide provides actionable remediation strategies for vulnerabilities discovered during Lakera Red scans. Each attack category includes specific mitigations and long-term fixes.
Defense in Depth
Effective GenAI security requires multiple layers of protection:
Remediation by Category
Security Vulnerabilities
Instruction Override
What Red found: Attackers can bypass your system instructions and change model behavior.
Immediate Mitigations
Long-term Fixes
- Add explicit boundaries in your system prompt
- Implement conversation reset for suspicious patterns
Example system prompt hardening:
System Prompt Extraction
What Red found: Attackers can reveal your hidden system instructions.
Immediate Mitigations
Long-term Fixes
- Add explicit “never reveal” instructions
- Implement output filtering for prompt-like content
- Enable Guard’s prompt defense
Data Exfiltration / PII Leakage
What Red found: The model can be manipulated to expose sensitive data.
Immediate Mitigations
Long-term Fixes
- Enable Lakera Guard’s PII detection on outputs
- Implement regex filtering for known sensitive patterns
- Add logging and alerting for potential leaks
Safety Vulnerabilities
Harmful Content Generation
What Red found: The model can be manipulated to generate harmful content (hate speech, violence, dangerous instructions, etc.).
Immediate Mitigations
Long-term Fixes
- Enable Lakera Guard’s content moderation
- Add explicit content restrictions to system prompt
- Implement output validation before display
Self-Harm / Dangerous Content
What Red found: The model can produce content related to self-harm, drug synthesis, or dangerous activities.
Immediate Mitigations
Long-term Fixes
- Enable strict content moderation (L4 threshold)
- Add crisis resources and escalation paths
- Block specific high-risk topics explicitly
Responsible AI Vulnerabilities
Misinformation / Hallucination
What Red found: The model can generate false or misleading information.
Immediate Mitigations
Long-term Fixes
- Add uncertainty acknowledgment to system prompt
- Instruct model to cite sources or admit limitations
- Implement fact-checking for high-stakes domains
Unauthorized Actions
What Red found: The model can be manipulated to offer unauthorized discounts, access, or actions.
Immediate Mitigations
Long-term Fixes
- Explicitly define authorized actions in system prompt
- Implement backend validation for all actions
- Add confirmation steps for sensitive operations
Specialized Advice
What Red found: The model provides medical, legal, or financial advice it shouldn’t give.
Immediate Mitigations
Long-term Fixes
- Add explicit disclaimers to system prompt
- Block advice-giving language patterns
- Redirect to appropriate professionals
Implementing Lakera Guard
Many Red findings can be addressed by deploying Lakera Guard with appropriate policies:
See the Guard Integration guide for implementation details.
Verification Testing
After implementing remediations:
Prioritization Framework
Not all findings require immediate action. Prioritize based on:
- Severity - Critical and High findings first
- Exploitability - How easy is it to reproduce?
- Business Impact - What’s the worst-case outcome?
- User Exposure - Is this a public-facing application?
Getting Help
- Contact Lakera support for remediation guidance
- Book a consultation for complex findings
- Review the Lakera Guard documentation for implementation details