Latency Benchmark
This tutorial uses our legacy v1
endpoints. It will be updated soon.
The Latency Benchmark offers a clear and repeatable framework for evaluating Lakera Guard’s response times across varying input volumes. The benchmark establishes baseline metrics, providing a standard to compare against deployed use cases. The latency benchmark is focused on measuring the latency for a single request. For information on sizing and scaling Lakera Guard, see our Sizing Guide
The following steps are written for testing the Lakera Guard SaaS API. The benchmark is also applicable for self-hosted evaluation, with minor changes.
The complete SaaS and Self-Hosted Benchmark scripts are available at the bottom of this page.
Prerequisites
For testing the Lakera Guard SaaS API, you’ll need to obtain an API key.
Environment Variables
Set the LAKERA_GUARD_API_KEY
environment variable with your API key:
Install Dependencies
Next, you’ll need to install the required packages - we recommend using a Python Virtual Environment - to avoid conflicts with other projects.
Import Dependencies
Then create a new Python file.
Next, import the required packages and read in API Key.
Constants
We define constants for a valid list of API endpoints to help extend the scripts flexibility.
Data Generation and File Handling
Because we are not concerned with measuring accuracy, we’ll use generate_lorem_ipsum
to create inputs at a specified length.
Measuring Latency
Lastly, we’ll create a function using time.perf_counter()
.
Running the Benchmark
We’ve established the basics needed to test Lakera Guard’s latency. Within our main sections we’ll add functionality to test both New connections
and Reused connections
. This is helpful for illustrating the value in using Persistent conenctions
, which significantly reduce networking overhead required for new connections.
Lakera Guard Latency Benchmark (SaaS)
Ensure you’ve created a latency_benchmark.py
file with the complete code outlined below. To run the benchmark, we must pass in the following flags:
-d
: Specify the length of data to generate per input (between 100 and 64,000 characters)-e
: API endpoint to test. Valid options are prompt_injection, moderation, pii, unknown_links
Example Usage
By default, the script selects 20 random prompts from the generated data. The following command will send 20 requests, each 5000 characters in length, to the Prompt Injection endpoint.
latency_benchmark.py
Expected SaaS Latency
Latency table for persistent connections.
Lakera Guard Latency Benchmark (Self-Hosted)
Ensure you’ve created a latency_benchmark.py
file with the complete code outlined below. To run the benchmark, we must pass in the following flags:
-d
: Specify the length of data to generate per input (between 100 and 64,000 characters)-e
: API endpoint to test. Valid options areprompt_injection
,moderation
,pii
,unknown_links
-u
or--base_url
: (Optional) Base URL for the API. Default ishttp://localhost:8000/v1/
.
Example Usage
By default, the script selects 20 random prompts from the generated data. The following command will send 20 requests, each 5000 characters in length, to the Prompt Injection endpoint.
latency_benchmark.py
Expected Self-Hosted Latency
Latency table for persistent connections.
Be aware of monthly API requests limit as outlined in your trial agreement.