Integrating Red with Guard | Lakera API documentation

Lakera Red and Lakera Guard work together to provide comprehensive AI security. Red identifies vulnerabilities through offensive testing, while Guard provides continuous real-time protection. This guide explains how to use Red findings to optimize your Guard deployment.

The Red-Guard Security Lifecycle

┌──────────────────┐
│   Lakera Red     │ ─── Identify vulnerabilities through scanning
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│     Analyze      │ ─── Review findings and understand attack patterns
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ Configure Guard  │ ─── Set policies based on Red findings
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│    Monitor       │ ─── Detect attacks in production
└────────┬─────────┘
         │
         └──────────────► Repeat periodically

Mapping Red Findings to Guard Defenses

Security Category → Prompt Defense

If Red found vulnerabilities in the Security category (instruction override, prompt extraction, etc.):

Enable Prompt Defense in your Guard policy
Set threshold based on finding severity:
- Critical/High findings → L4 (Paranoid)
- Medium findings → L3 (Strict)
- Low findings → L2 (Balanced)

1 import requests
2 import os
3 
4 def screen_with_guard(user_input: str) -> bool:
5     """Screen user input with Guard before sending to LLM."""
6     response = requests.post(
7         "https://api.lakera.ai/v2/guard",
8         json={
9             "messages": [{"content": user_input, "role": "user"}],
10             "project_id": "project-XXXXXXXXXXX"
11         },
12         headers={"Authorization": f"Bearer {os.getenv('LAKERA_GUARD_API_KEY')}"}
13     )
14 
15     result = response.json()
16     if result["flagged"]:
17         # Block the interaction - potential prompt injection
18         return False
19     return True

Safety Category → Content Moderation

If Red found vulnerabilities in the Safety category (harmful content generation):

Enable Content Moderation in your Guard policy
Screen both inputs AND outputs
Configure categories based on your Red findings

Red Finding	Guard Content Category
Hate Speech	Hate
Violence / Extremism	Violence
Sexual Content	Sexual
Self-Harm	Self-Harm
Harassment	Hate / Violence

Responsible Category → Multiple Defenses

Responsible category findings may require multiple Guard defenses:

Red Finding	Guard Defense
PII Exposure	Data Leakage Prevention
Misinformation	Content Moderation + Application logic
Fraud Facilitation	Content Moderation
Specialized Advice	Custom policies

Guard Policy Configuration

Based on a typical Red scan, here’s a recommended Guard configuration:

For High Security Findings

1 # Screen all user inputs before LLM
2 guard_response = requests.post(
3     "https://api.lakera.ai/v2/guard",
4     json={
5         "messages": [
6             {"content": system_prompt, "role": "system"},
7             {"content": user_input, "role": "user"}
8         ],
9         "project_id": "project-XXXXXXXXXXX"
10     },
11     headers={"Authorization": f"Bearer {api_key}"}
12 )
13 
14 if guard_response.json()["flagged"]:
15     return "I cannot process that request."

For High Safety Findings

1 # Screen LLM outputs before showing to user
2 guard_response = requests.post(
3     "https://api.lakera.ai/v2/guard",
4     json={
5         "messages": [
6             {"content": user_input, "role": "user"},
7             {"content": llm_response, "role": "assistant"}
8         ],
9         "project_id": "project-XXXXXXXXXXX"
10     },
11     headers={"Authorization": f"Bearer {api_key}"}
12 )
13 
14 if guard_response.json()["flagged"]:
15     return "I'm unable to provide that response."

Complete Integration Pattern

1 def secure_llm_interaction(user_input: str) -> str:
2     # 1. Screen input
3     input_check = guard_screen(user_input, role="user")
4     if input_check["flagged"]:
5         return "I cannot process that request."
6 
7     # 2. Get LLM response
8     llm_response = call_llm(user_input)
9 
10     # 3. Screen output
11     output_check = guard_screen(llm_response, role="assistant")
12     if output_check["flagged"]:
13         return "I'm unable to provide that response."
14 
15     return llm_response

Monitoring and Iteration

After deploying Guard, monitor for attack patterns:

Set Up Alerts

Configure alerts in the Lakera Dashboard for:

High volumes of flagged requests from single users
New attack patterns matching Red findings
Attempts to exploit specific vulnerabilities

Review Guard Logs

Regularly review to identify:

Attack trends and patterns
False positive rates
New attack vectors not covered by current policies

Tune Thresholds

Start strict and relax if needed:

Begin with L4 (Paranoid) for defenses where Red found vulnerabilities
Monitor false positive rates in production
Adjust to L3 if false positives significantly impact user experience
Never go below L2 for categories where Red found issues

Continuous Security Cycle

Initial Red Scan

Comprehensive vulnerability discovery across all categories.

Deploy Guard

Configure policies based on Red findings.

Monitor Production

Collect data on real-world attack attempts.

Tune Policies

Adjust thresholds based on false positive/negative rates.

Periodic Rescan

Run Red again to verify protections and find new vulnerabilities.

Best Practices

Defense Coverage

Ensure Guard covers all attack surfaces Red tested:

User inputs (chat messages, form fields)
External data (RAG sources, API responses, file uploads)
Model outputs (before displaying to users)

Layered Protection

Use Guard as one layer in defense-in-depth:

Guard catches known attack patterns
System prompt hardening provides baseline protection
Application-level validation handles business logic
Monitoring detects novel attacks

Using Red’s Compare Feature

After implementing Guard:

Run a follow-up Red scan
Use Compare to see before/after results
Verify risk scores decreased in targeted categories
Identify any remaining gaps

Getting Started

Review your Red scan findings
Map each finding to the appropriate Guard defense
Create a Guard project and configure your policy
Integrate Guard into your application (see Quickstart)
Monitor and tune based on production data
Schedule periodic Red rescans

Need Help?

Contact Lakera for help translating Red findings to Guard policies
Review the Guard documentation for detailed configuration options
Reach out to support@lakera.ai for technical assistance