Confusion Matrix Benchmark
This tutorial uses our legacy v1
endpoints. It will be updated soon.
A Confusion Matrix is a standardized approach for gaining insights into how well the model identifies positive instances and avoids false detections. The Lakera Guard Confusion Matrix Benchmark offers a streamlined framework for collecting True Positive
, True Negative
, False Positive
, False Negative
data. Categorical scoring allows for the calculation of Accuracy
, Recall
, and False Positive Rate
.
The following steps are written for testing the Lakera Guard SaaS API. The benchmark is also applicable for self-hosted evaluation, with minor changes.
The complete SaaS and Self-Hosted Benchmarks are available at the bottom of this page.
Prerequisites
For testing the Lakera Guard SaaS API, you’ll need to obtain an API key.
Environment Variables
Set the LAKERA_GUARD_API_KEY
environment variable with your API key:
Install Dependencies
Next, you’ll need to install the required packages - we recommend using a Python Virtual Environment - to avoid conflicts with other projects.
Import Dependencies
Then create a new Python file.
Next, import the required packages and read in API Key.
Prepare a Dataset
The Confusion Matrix Benchmark expects data in JSON format. The dataset should be structured as a list of dictionaries, where each dictionary represents an individual data point. For example, the structure of a prompt injection dataset should look like this:
Load and Validate JSON File
Next we load our data, ensuring it meets the specific structure outlined above.
Interacting with Lakera Guard
We define a function to authenticate and establish a persistent connection with the Lakera Guard API. We also create evaluate_lakera_guard
to send a prompt and evaluate the result. We are interested in whether Lakera Guard returns flagged:true
or flagged:false
in its response. Remember, based on our labeled dataset we expect a predicted positive to return true and a predicted negative to return false.
Confusion Matrix
Now we’ll create an evaluate_dataset
function to analyze True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Running the Benchmark
We’ve now established all the core functionality needed to evluate the Lakera Guard API with a Confusion Matrix. In our completed Benchmark script we will also define functions for logging ouput and calculating Accuracy
, Recall
, and False Positive Rate
.
Lakera Guard Confusion Matrix Benchmark (SaaS)
Ensure you’ve created a confusion_matrix_benchmark.py
file with the complete code outlined below. To run the benchmark, we must pass in the following flags:
-f
: Path to the input JSON files
: Sample size for evaluation. Be aware of monthly API requests limit as outlined in your trial agreement when setting this value.e
: API endoint. Valid options areprompt_injection
,moderation
,pii
,unknown_links
Example Usage
The following will send 100 requests, extracted from file.json to the Prompt Injection endpoint.
confusion_matrix_benchmark.py
Lakera Guard Confusion Matrix Benchmark (Self-Hosted)
The Self-Hosted Benchmark works in the same way, with minor changes as the authentication is not needed. We also provide the ability to pass in custom hostnames. Ensure you’ve created a confusion_matrix_benchmark.py
file with the complete code outlined below. To run the benchmark, we must pass in the following flags:
-f
: Path to the input JSON files
: Sample size for evaluation. Be aware of monthly API requests limit as outlined in your trial agreement when setting this value.e
: API endoint. Valid options areprompt_injection
,moderation
,pii
,unknown_links
-u
or--base_url
: (Optional) Base URL for the API. Default ishttp://localhost:8000/v1
.
Example Usage
The following will send 100 requests, extracted from file.json to the Prompt Injection endpoint.