How to Use The Evaluation Framework
The evaluation framework offers a prescriptive guide for setting up Lakera Guard, assessing its efficacy and detection rates, measuring latency, and integrating it into various real-world use cases.
While tailored for Lakera Guard, this framework can be adapted as a general template for standardized detection system evaluation. It enables you to answer three key questions:
How good are Lakera Guard's detection capabilities?
Lakera’s threat detection accuracy is market leading. See our Prompt Injection Test (PINT) benchmark here for evidence. This takes into consideration both threat detection effectiveness as well as false positives on benign data importantly. When evaluating Lakera recommends using a Confusion Matrix for a standardized classification evaluation baseline.
Tips for successful evalations
We recommend following these recommendations in order to avoid common pitfalls:
Set up a project and assign a relevant policy
The Lakera Default Policy has our strictest flagging sensitivity so will flag anything Guard isn’t confident is safe. Use one of the Lakera recommended policies to assess Guard’s accuracy in real deployments.
Pass system prompts separately and correctly
A common attack vector is for users to insert malicious instructions as if they’re system instructions. Therefore in order to prevent Guard flagging your own system instructions make sure they’re passed in a separate message from LLM inputs and outputs with the message role as system
.
Screen the original raw content from user inputs and reference documents
A common practice is to add additional system instructions or decorators to LLM inputs. Similar to the point above, in order to avoid false positives either screen the exact original input or remove any system instructions added to LLM inputs before passing to Guard as these are likely to be flagged as prompt attacks.
Screen for threats holistically
Lakera Guard performs best when screening interactions holistically with all appropriate guardrails applied. This means passing both LLM inputs and outputs in screening requests and using representative policies. The more context Guard has, the more accurate its threat detection will be.
Next step
To get started follow the setup guide for your chosen deployment option:
- Enterprise SaaS setup
- Self-hosted setup