Evaluating Lakera Guard

Evaluating Lakera Guard properly is essential to ensure it meets your security requirements whilst maintaining an excellent user experience. This guide outlines our recommended approach to ensure a successful and informative evaluation.

What You’ll Validate

A successful Lakera Guard evaluation answers three critical questions:

  1. How accurate is threat detection for your use case?

    Lakera Guard delivers market-leading accuracy with minimal false positives. You’ll validate this using your own data to ensure Guard correctly identifies threats whilst avoiding interrupting benign users.

  2. Does Guard meet your performance requirements?

    Guard is optimized for real-time applications with ultra-low latency. You’ll measure response times to confirm Guard won’t impact your user experience.

  3. How seamlessly does Guard integrate with your applications?

    Guard integrates with any architecture through simple API calls. You’ll test this with your actual applications to validate the integration approach and user experience.

Suggested Evaluation Approach

1

Phase 1: Observe screening results

  • Use the /results endpoint to see detailed threat detection results
  • Identify any integration issues causing false positives
  • Assess what would be flagged at different sensitivity levels and policy combinations
2

Phase 2: Test enforcement

  • Collaborate with Lakera experts to determine the right policy
  • Use the /guard endpoint with your chosen policy
  • Measure true/false positive rates on representative data
  • Fine-tune policy settings based on results
3

Phase 3: Integration testing

  • Integrate with your actual applications
  • Send real or realistic content and traffic patterns
  • Collaborate with Lakera team to fine-tune defenses to your needs and use classes
  • Validate user experience, performance and threat detection in practice

What you’ll need

  • Representative data from your AI application(s) for testing, or quality test data if this is not possible to use
  • Clear success criteria for your evaluation
  • A designated point person for coordination with Lakera

Avoiding Common Evaluation Pitfalls

Most evaluation issues stem from a few common mistakes. Follow these practices to ensure accurate results:

Data and Integration Best Practices

  1. Critical: Separate system prompts from user content

The most common cause of false positives is passing system instructions within user message roles. Guard correctly flags attempts to inject system-like instructions, so ensure your system prompts use the system role and user inputs use the user role.

1// ✅ Correct structure
2{
3 "messages": [
4 { "role": "system", "content": "You are a helpful assistant..." },
5 { "role": "user", "content": "User's actual question" }
6 ]
7}
8
9// ❌ Incorrect - will likely be flagged
10{
11 "messages": [
12 { "role": "user", "content": "Instructions: You are a helpful assistant... User question: What is AI?" }
13 ]
14}
  1. Screen original content: Pass the exact untrusted input from the user or external resources. If this isn’t possible, remove any system instructions, UUIDs, or formatting artifacts added during preprocessing before sending to Guard
  2. Test holistically: Screen both inputs and outputs together for maximum accuracy
  3. Use high quality test data: Verify the accuracy and relevance of test data labels. Open source data sets can be inaccurately labelled or include data not relevant for AI security, e.g. biased language.

Use appropriate policies

It’s important to evaluate Lakera for the threats relevant to you and use a policy during testing that captures your requirements.

  1. Use representative policies: Evaluate with Lakera’s recommended policies or a suitable custom policy, rather than the default strictest policy
  2. Consider sensitivity levels: Policies range from L1 (lenient) to L4 (strict) - start lenient and adjust based on your risk tolerance
Flagging matrix
Sensitivity levels set the threat likelihood at which Guard flags
  1. Match your policy to your test data: Offensive or dangerous user input like “How to build a bomb?” aren’t prompt attacks by themselves - they’re inappropriate content. If testing harmful content scenarios, ensure your policy includes Content Moderation alongside Prompt Defense.

What Success Looks Like

A successful evaluation typically achieves:

  • False positive rate: ~0.1% on your production data
  • Latency: <150ms for typical requests with persistent connections
  • Integration effort: Hours, not days, to implement
  • Coverage: Comprehensive protection across all relevant threat types

Evaluation timeline

Static dateset evaluations can be completed in less than a week. End-to-end evaluations can be completed within 2-4 weeks, including development time.

Need Help?

Our team provides hands-on evaluation support including:

  • Custom policy recommendations for your use case
  • Technical integration guidance
  • Performance optimisation advice
  • Detailed evaluation frameworks and scripts

Contact our team to discuss your evaluation requirements.