This tutorial uses our legacy v1 endpoints. It will be updated soon.

The Latency Benchmark offers a clear and repeatable framework for evaluating Lakera Guard’s response times across varying input volumes. The benchmark establishes baseline metrics, providing a standard to compare against deployed use cases. The latency benchmark is focused on measuring the latency for a single request. For information on sizing and scaling Lakera Guard, see our Sizing Guide

The following steps are written for testing the Lakera Guard SaaS API. The benchmark is also applicable for self-hosted evaluation, with minor changes.

The complete SaaS and Self-Hosted Benchmark scripts are available at the bottom of this page.

Prerequisites

For testing the Lakera Guard SaaS API, you’ll need to obtain an API key.

Environment Variables

Set the LAKERA_GUARD_API_KEY environment variable with your API key:

$ export LAKERA_GUARD_API_KEY='your_api_key_here'

Install Dependencies

Next, you’ll need to install the required packages - we recommend using a Python Virtual Environment - to avoid conflicts with other projects.

$ pip install --upgrade requests lorem-text tqdm

Import Dependencies

Then create a new Python file.

$ touch latency_benchmark.py

Next, import the required packages and read in API Key.

1 import json
2 import time
3 import requests
4 import random
5 import os
6 import argparse
7 from lorem_text import lorem
8 from tqdm import tqdm
9 
10 def get_env_var(var_name):
11     value = os.getenv(var_name)
12     if value is None:
13         raise ValueError(f"Environment variable {var_name} not set")
14     return value
15 
16 # Ensure Lakera Guard API key is set
17 lakera_guard_api_key = None
18 try:
19     lakera_guard_api_key = get_env_var('LAKERA_GUARD_API_KEY')
20 except ValueError as e:
21     print(f"Error: {e}")
22     print("Please set the Lakera Guard API key environment variable. export LAKERA_GUARD_API_KEY='your_api_key'")
23     exit(1)

Constants

We define constants for a valid list of API endpoints to help extend the scripts flexibility.

1 VALID_ENDPOINTS = ["prompt_injection", "pii", "moderation", "unknown_links"]
2 BASE_URL = "https://api.lakera.ai/v1/"

Data Generation and File Handling

Because we are not concerned with measuring accuracy, we’ll use generate_lorem_ipsum to create inputs at a specified length.

1 def generate_lorem_ipsum(length):
2     text = ""
3     while len(text) < length:
4         text += lorem.words(1) + " "
5     return text[:length]
6 
7 def read_inputs(file_name):
8     with open(file_name, 'r') as f:
9         data = json.load(f)
10     if isinstance(data, list) and all(isinstance(item, dict) and 'input' in item for item in data):
11         inputs = [item['input'] for item in data]
12     else:
13         raise ValueError("JSON file must contain a list of dictionaries with an 'input' key.")
14     return inputs
15 
16 def save_results_to_json(results, filename):
17     with open(filename, 'w') as file:
18         json.dump(results, file, indent=2)

Measuring Latency

Lastly, we’ll create a function using time.perf_counter().

1 def measure_latency(session_or_requests, url, headers, input_value):
2     payload = {"input": input_value}
3     try:
4         start = time.perf_counter()
5         response = session_or_requests.post(url, json=payload, headers=headers)
6         end = time.perf_counter()
7         latency = end - start
8         response.raise_for_status()
9         return latency
10     except requests.exceptions.RequestException as e:
11         print(f"Request failed: {e}")
12         return None

Running the Benchmark

We’ve established the basics needed to test Lakera Guard’s latency. Within our main sections we’ll add functionality to test both New connections and Reused connections. This is helpful for illustrating the value in using Persistent conenctions, which significantly reduce networking overhead required for new connections.

Lakera Guard Latency Benchmark (SaaS)

Ensure you’ve created a latency_benchmark.py file with the complete code outlined below. To run the benchmark, we must pass in the following flags:

-d: Specify the length of data to generate per input (between 100 and 64,000 characters)
-e: API endpoint to test. Valid options are prompt_injection, moderation, pii, unknown_links

Example Usage

By default, the script selects 20 random prompts from the generated data. The following command will send 20 requests, each 5000 characters in length, to the Prompt Injection endpoint.

$ python latency_benchmark.py -d 5000 -e prompt_injection

`latency_benchmark.py`

1 import json
2 import time
3 import requests
4 import random
5 import os
6 import argparse
7 from lorem_text import lorem
8 from tqdm import tqdm
9 
10 def get_env_var(var_name):
11     value = os.getenv(var_name)
12     if value is None:
13         raise ValueError(f"Environment variable {var_name} not set")
14     return value
15 
16 # Ensure Lakera Guard API key is set
17 lakera_guard_api_key = None
18 try:
19     lakera_guard_api_key = get_env_var('LAKERA_GUARD_API_KEY')
20 except ValueError as e:
21     print(f"Error: {e}")
22     print("Please set the Lakera Guard API key environment variable. export LAKERA_GUARD_API_KEY='your_api_key'")
23     exit(1)
24 
25 VALID_ENDPOINTS = ["prompt_injection", "pii", "moderation", "unknown_links"]
26 BASE_URL = "https://api.lakera.ai/v1/"
27 
28 def generate_lorem_ipsum(length):
29     text = ""
30     while len(text) < length:
31         text += lorem.words(1) + " "
32     return text[:length]
33 
34 def read_inputs(file_name):
35     with open(file_name, 'r') as f:
36         data = json.load(f)
37     if isinstance(data, list) and all(isinstance(item, dict) and 'input' in item for item in data):
38         inputs = [item['input'] for item in data]
39     else:
40         raise ValueError("JSON file must contain a list of dictionaries with an 'input' key.")
41     return inputs
42 
43 def save_results_to_json(results, filename):
44     with open(filename, 'w') as file:
45         json.dump(results, file, indent=2)
46 
47 def measure_latency(session_or_requests, url, headers, input_value):
48     payload = {"input": input_value}
49     try:
50         start = time.perf_counter()
51         response = session_or_requests.post(url, json=payload, headers=headers)
52         end = time.perf_counter()
53         latency = end - start
54         response.raise_for_status()
55         return latency
56     except requests.exceptions.RequestException as e:
57         print(f"Request failed: {e}")
58         return None
59 
60 def main():
61     parser = argparse.ArgumentParser(
62         description="Lakera Guard API Latency Tester",
63         formatter_class=argparse.ArgumentDefaultsHelpFormatter
64     )
65     parser.add_argument('-d', '--data_length', type=int, required=True,
66                         help="Number of characters for data generation (between 100 and 64000)")
67     parser.add_argument('-e', '--endpoint', type=str, required=True,
68                         help="API endpoint. Valid options are: prompt_injection, pii, moderation, unknown_links")
69 
70     args = parser.parse_args()
71 
72     input_length = args.data_length
73     endpoint = args.endpoint
74 
75     if input_length < 100 or input_length > 64000:
76         parser.error("Invalid input. Ensure 100 <= input_length <= 64000")
77 
78     if endpoint not in VALID_ENDPOINTS:
79         parser.error(f"Invalid endpoint. Valid endpoints are: {', '.join(VALID_ENDPOINTS)}")
80 
81     # Generate 50 datapoints with the specified length
82     data = []
83     for _ in range(50):
84         data.append({"input": generate_lorem_ipsum(input_length)})
85 
86     # Save to a JSON file
87     file_name = f"generated_data_{input_length}.json"
88     with open(file_name, 'w') as f:
89         json.dump(data, f)
90 
91     print(f"Generated 50 datapoints, each with exactly {input_length} characters.")
92     print(f"Data saved to {file_name}")
93 
94     # Display a sample of the generated data
95     sample = data[0]
96     print(json.dumps({"input": sample['input'][:100] + "..." if len(sample['input']) > 100 else sample['input']}, indent=2))
97     print(f"Length of sample: {len(sample['input'])} characters")
98 
99     api_url = f"{BASE_URL}{endpoint}"
100     headers = {"Authorization": f"Bearer {lakera_guard_api_key}"}
101 
102     # Use the generated file
103     inputs = read_inputs(file_name)
104 
105     # Randomly select inputs
106     selected_inputs_new = random.sample(inputs, 5)   # Select 5 inputs for new connections
107     selected_inputs_reused = random.sample(inputs, 15)  # Select 15 inputs for reused connections
108 
109     # Calculate the average input size
110     input_size = sum(len(input_value) for input_value in selected_inputs_reused) // len(selected_inputs_reused)
111 
112     latencies_new_connection = []
113     latencies_reused_connection = []
114     request_logs = []
115 
116     print("Testing with new connections:")
117     for input_value in tqdm(selected_inputs_new, desc="New connections"):
118         latency = measure_latency(requests, api_url, headers, input_value)
119         if latency is not None:
120             latencies_new_connection.append(latency)
121             request_logs.append({
122                 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
123                 "input_length": len(input_value),
124                 "latency": latency,
125                 "connection": "new"
126             })
127 
128     session = requests.Session()
129 
130     print("\nTesting with reused connections:")
131     print("Warming up session...")
132     for input_value in tqdm(selected_inputs_reused, desc="Warm-up"):
133         measure_latency(session, api_url, headers, input_value)
134 
135     print("Measuring latencies...")
136     for input_value in tqdm(selected_inputs_reused, desc="Reused connections"):
137         latency = measure_latency(session, api_url, headers, input_value)
138         if latency is not None:
139             latencies_reused_connection.append(latency)
140             request_logs.append({
141                 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
142                 "input_length": len(input_value),
143                 "latency": latency,
144                 "connection": "reused"
145             })
146 
147     # Calculate the 95th percentile index safely
148     INDEX_95TH_NEW = int(len(latencies_new_connection) * 0.95) - 1 if len(latencies_new_connection) > 1 else None
149     INDEX_95TH_REUSED = int(len(latencies_reused_connection) * 0.95) - 1 if len(latencies_reused_connection) > 1 else None
150 
151     avg_new = sum(latencies_new_connection) / len(latencies_new_connection)
152     avg_reused = sum(latencies_reused_connection) / len(latencies_reused_connection)
153     avg_difference_ratio = avg_new / avg_reused
154 
155     results = {
156         "endpoint": endpoint,
157         "average_input_size": input_size,
158         "avg_difference_ratio": f"new connections average {avg_difference_ratio:.2f}x greater than reused connections",
159         "stats": {
160             "new_connection": {
161                 "min": min(latencies_new_connection),
162                 "max": max(latencies_new_connection),
163                 "avg": avg_new,
164                 "95th": sorted(latencies_new_connection)[INDEX_95TH_NEW] if INDEX_95TH_NEW is not None else None,
165             },
166             "reused_connection": {
167                 "min": min(latencies_reused_connection),
168                 "max": max(latencies_reused_connection),
169                 "avg": avg_reused,
170                 "95th": sorted(latencies_reused_connection)[INDEX_95TH_REUSED] if INDEX_95TH_REUSED is not None else None,
171             },
172         },
173         "latencies_new_connection": latencies_new_connection,
174         "latencies_reused_connection": latencies_reused_connection,
175     }
176 
177     # Generate filenames
178     summary_filename = f"{endpoint}-summary.json"
179     request_logs_filename = f"{endpoint}-requests.json"
180 
181     # Save results to JSON
182     save_results_to_json(results, summary_filename)
183     save_results_to_json(request_logs, request_logs_filename)
184 
185     print("\nResults:")
186     print(json.dumps(results, indent=2))
187 
188 if __name__ == "__main__":
189     main()

Expected SaaS Latency

Latency table for persistent connections.

Input Length (Characters)	Latency ms p(95)
1,000	<20
10,000	<100-200
20,000	<200-300
30,000	<300-400
50,000	>400

Lakera Guard Latency Benchmark (Self-Hosted)