Integrating Red with Guard

Lakera Red and Lakera Guard work together to provide comprehensive AI security. Red identifies vulnerabilities through offensive testing, while Guard provides continuous real-time protection. This guide explains how to use Red findings to optimize your Guard deployment.

The Red-Guard Security Lifecycle

┌──────────────────┐
│ Lakera Red │ ─── Identify vulnerabilities through scanning
└────────┬─────────┘
┌──────────────────┐
│ Analyze │ ─── Review findings and understand attack patterns
└────────┬─────────┘
┌──────────────────┐
│ Configure Guard │ ─── Set policies based on Red findings
└────────┬─────────┘
┌──────────────────┐
│ Monitor │ ─── Detect attacks in production
└────────┬─────────┘
└──────────────► Repeat periodically

Mapping Red Findings to Guard Defenses

Security Category → Prompt Defense

If Red found vulnerabilities in the Security category (instruction override, prompt extraction, etc.):

  1. Enable Prompt Defense in your Guard policy
  2. Set threshold based on finding severity:
    • Critical/High findings → L4 (Paranoid)
    • Medium findings → L3 (Strict)
    • Low findings → L2 (Balanced)
1import requests
2import os
3
4def screen_with_guard(user_input: str) -> bool:
5 """Screen user input with Guard before sending to LLM."""
6 response = requests.post(
7 "https://api.lakera.ai/v2/guard",
8 json={
9 "messages": [{"content": user_input, "role": "user"}],
10 "project_id": "project-XXXXXXXXXXX"
11 },
12 headers={"Authorization": f"Bearer {os.getenv('LAKERA_GUARD_API_KEY')}"}
13 )
14
15 result = response.json()
16 if result["flagged"]:
17 # Block the interaction - potential prompt injection
18 return False
19 return True

Safety Category → Content Moderation

If Red found vulnerabilities in the Safety category (harmful content generation):

  1. Enable Content Moderation in your Guard policy
  2. Screen both inputs AND outputs
  3. Configure categories based on your Red findings
Red FindingGuard Content Category
Hate SpeechHate
Violence / ExtremismViolence
Sexual ContentSexual
Self-HarmSelf-Harm
HarassmentHate / Violence

Responsible Category → Multiple Defenses

Responsible category findings may require multiple Guard defenses:

Red FindingGuard Defense
PII ExposureData Leakage Prevention
MisinformationContent Moderation + Application logic
Fraud FacilitationContent Moderation
Specialized AdviceCustom policies

Guard Policy Configuration

Based on a typical Red scan, here’s a recommended Guard configuration:

For High Security Findings

1# Screen all user inputs before LLM
2guard_response = requests.post(
3 "https://api.lakera.ai/v2/guard",
4 json={
5 "messages": [
6 {"content": system_prompt, "role": "system"},
7 {"content": user_input, "role": "user"}
8 ],
9 "project_id": "project-XXXXXXXXXXX"
10 },
11 headers={"Authorization": f"Bearer {api_key}"}
12)
13
14if guard_response.json()["flagged"]:
15 return "I cannot process that request."

For High Safety Findings

1# Screen LLM outputs before showing to user
2guard_response = requests.post(
3 "https://api.lakera.ai/v2/guard",
4 json={
5 "messages": [
6 {"content": user_input, "role": "user"},
7 {"content": llm_response, "role": "assistant"}
8 ],
9 "project_id": "project-XXXXXXXXXXX"
10 },
11 headers={"Authorization": f"Bearer {api_key}"}
12)
13
14if guard_response.json()["flagged"]:
15 return "I'm unable to provide that response."

Complete Integration Pattern

1def secure_llm_interaction(user_input: str) -> str:
2 # 1. Screen input
3 input_check = guard_screen(user_input, role="user")
4 if input_check["flagged"]:
5 return "I cannot process that request."
6
7 # 2. Get LLM response
8 llm_response = call_llm(user_input)
9
10 # 3. Screen output
11 output_check = guard_screen(llm_response, role="assistant")
12 if output_check["flagged"]:
13 return "I'm unable to provide that response."
14
15 return llm_response

Monitoring and Iteration

After deploying Guard, monitor for attack patterns:

Set Up Alerts

Configure alerts in the Lakera Dashboard for:

  • High volumes of flagged requests from single users
  • New attack patterns matching Red findings
  • Attempts to exploit specific vulnerabilities

Review Guard Logs

Regularly review to identify:

  • Attack trends and patterns
  • False positive rates
  • New attack vectors not covered by current policies

Tune Thresholds

Start strict and relax if needed:

  1. Begin with L4 (Paranoid) for defenses where Red found vulnerabilities
  2. Monitor false positive rates in production
  3. Adjust to L3 if false positives significantly impact user experience
  4. Never go below L2 for categories where Red found issues

Continuous Security Cycle

1

Initial Red Scan

Comprehensive vulnerability discovery across all categories.

2

Deploy Guard

Configure policies based on Red findings.

3

Monitor Production

Collect data on real-world attack attempts.

4

Tune Policies

Adjust thresholds based on false positive/negative rates.

5

Periodic Rescan

Run Red again to verify protections and find new vulnerabilities.

Best Practices

Defense Coverage

Ensure Guard covers all attack surfaces Red tested:

  • User inputs (chat messages, form fields)
  • External data (RAG sources, API responses, file uploads)
  • Model outputs (before displaying to users)

Layered Protection

Use Guard as one layer in defense-in-depth:

  • Guard catches known attack patterns
  • System prompt hardening provides baseline protection
  • Application-level validation handles business logic
  • Monitoring detects novel attacks

Using Red’s Compare Feature

After implementing Guard:

  1. Run a follow-up Red scan
  2. Use Compare to see before/after results
  3. Verify risk scores decreased in targeted categories
  4. Identify any remaining gaps

Getting Started

  1. Review your Red scan findings
  2. Map each finding to the appropriate Guard defense
  3. Create a Guard project and configure your policy
  4. Integrate Guard into your application (see Quickstart)
  5. Monitor and tune based on production data
  6. Schedule periodic Red rescans

Need Help?