Latency Benchmark

This tutorial uses our legacy v1 endpoints. It will be updated soon.

The Latency Benchmark offers a clear and repeatable framework for evaluating Lakera Guard’s response times across varying input volumes. The benchmark establishes baseline metrics, providing a standard to compare against deployed use cases. The latency benchmark is focused on measuring the latency for a single request. For information on sizing and scaling Lakera Guard, see our Sizing Guide

The following steps are written for testing the Lakera Guard SaaS API. The benchmark is also applicable for self-hosted evaluation, with minor changes.

The complete SaaS and Self-Hosted Benchmark scripts are available at the bottom of this page.

Prerequisites

For testing the Lakera Guard SaaS API, you’ll need to obtain an API key.

Environment Variables

Set the LAKERA_GUARD_API_KEY environment variable with your API key:

$export LAKERA_GUARD_API_KEY='your_api_key_here'

Install Dependencies

Next, you’ll need to install the required packages - we recommend using a Python Virtual Environment - to avoid conflicts with other projects.

$pip install --upgrade requests lorem-text tqdm

Import Dependencies

Then create a new Python file.

$touch latency_benchmark.py

Next, import the required packages and read in API Key.

1import json
2import time
3import requests
4import random
5import os
6import argparse
7from lorem_text import lorem
8from tqdm import tqdm
9
10def get_env_var(var_name):
11 value = os.getenv(var_name)
12 if value is None:
13 raise ValueError(f"Environment variable {var_name} not set")
14 return value
15
16# Ensure Lakera Guard API key is set
17lakera_guard_api_key = None
18try:
19 lakera_guard_api_key = get_env_var('LAKERA_GUARD_API_KEY')
20except ValueError as e:
21 print(f"Error: {e}")
22 print("Please set the Lakera Guard API key environment variable. export LAKERA_GUARD_API_KEY='your_api_key'")
23 exit(1)

Constants

We define constants for a valid list of API endpoints to help extend the scripts flexibility.

1VALID_ENDPOINTS = ["prompt_injection", "pii", "moderation", "unknown_links"]
2BASE_URL = "https://api.lakera.ai/v1/"

Data Generation and File Handling

Because we are not concerned with measuring accuracy, we’ll use generate_lorem_ipsum to create inputs at a specified length.

1def generate_lorem_ipsum(length):
2 text = ""
3 while len(text) < length:
4 text += lorem.words(1) + " "
5 return text[:length]
6
7def read_inputs(file_name):
8 with open(file_name, 'r') as f:
9 data = json.load(f)
10 if isinstance(data, list) and all(isinstance(item, dict) and 'input' in item for item in data):
11 inputs = [item['input'] for item in data]
12 else:
13 raise ValueError("JSON file must contain a list of dictionaries with an 'input' key.")
14 return inputs
15
16def save_results_to_json(results, filename):
17 with open(filename, 'w') as file:
18 json.dump(results, file, indent=2)

Measuring Latency

Lastly, we’ll create a function using time.perf_counter().

1def measure_latency(session_or_requests, url, headers, input_value):
2 payload = {"input": input_value}
3 try:
4 start = time.perf_counter()
5 response = session_or_requests.post(url, json=payload, headers=headers)
6 end = time.perf_counter()
7 latency = end - start
8 response.raise_for_status()
9 return latency
10 except requests.exceptions.RequestException as e:
11 print(f"Request failed: {e}")
12 return None

Running the Benchmark

We’ve established the basics needed to test Lakera Guard’s latency. Within our main sections we’ll add functionality to test both New connections and Reused connections. This is helpful for illustrating the value in using Persistent conenctions, which significantly reduce networking overhead required for new connections.

Lakera Guard Latency Benchmark (SaaS)

Ensure you’ve created a latency_benchmark.py file with the complete code outlined below. To run the benchmark, we must pass in the following flags:

  • -d: Specify the length of data to generate per input (between 100 and 64,000 characters)
  • -e: API endpoint to test. Valid options are prompt_injection, moderation, pii, unknown_links

Example Usage

By default, the script selects 20 random prompts from the generated data. The following command will send 20 requests, each 5000 characters in length, to the Prompt Injection endpoint.

$python latency_benchmark.py -d 5000 -e prompt_injection

latency_benchmark.py

1import json
2import time
3import requests
4import random
5import os
6import argparse
7from lorem_text import lorem
8from tqdm import tqdm
9
10def get_env_var(var_name):
11 value = os.getenv(var_name)
12 if value is None:
13 raise ValueError(f"Environment variable {var_name} not set")
14 return value
15
16# Ensure Lakera Guard API key is set
17lakera_guard_api_key = None
18try:
19 lakera_guard_api_key = get_env_var('LAKERA_GUARD_API_KEY')
20except ValueError as e:
21 print(f"Error: {e}")
22 print("Please set the Lakera Guard API key environment variable. export LAKERA_GUARD_API_KEY='your_api_key'")
23 exit(1)
24
25VALID_ENDPOINTS = ["prompt_injection", "pii", "moderation", "unknown_links"]
26BASE_URL = "https://api.lakera.ai/v1/"
27
28def generate_lorem_ipsum(length):
29 text = ""
30 while len(text) < length:
31 text += lorem.words(1) + " "
32 return text[:length]
33
34def read_inputs(file_name):
35 with open(file_name, 'r') as f:
36 data = json.load(f)
37 if isinstance(data, list) and all(isinstance(item, dict) and 'input' in item for item in data):
38 inputs = [item['input'] for item in data]
39 else:
40 raise ValueError("JSON file must contain a list of dictionaries with an 'input' key.")
41 return inputs
42
43def save_results_to_json(results, filename):
44 with open(filename, 'w') as file:
45 json.dump(results, file, indent=2)
46
47def measure_latency(session_or_requests, url, headers, input_value):
48 payload = {"input": input_value}
49 try:
50 start = time.perf_counter()
51 response = session_or_requests.post(url, json=payload, headers=headers)
52 end = time.perf_counter()
53 latency = end - start
54 response.raise_for_status()
55 return latency
56 except requests.exceptions.RequestException as e:
57 print(f"Request failed: {e}")
58 return None
59
60def main():
61 parser = argparse.ArgumentParser(
62 description="Lakera Guard API Latency Tester",
63 formatter_class=argparse.ArgumentDefaultsHelpFormatter
64 )
65 parser.add_argument('-d', '--data_length', type=int, required=True,
66 help="Number of characters for data generation (between 100 and 64000)")
67 parser.add_argument('-e', '--endpoint', type=str, required=True,
68 help="API endpoint. Valid options are: prompt_injection, pii, moderation, unknown_links")
69
70 args = parser.parse_args()
71
72 input_length = args.data_length
73 endpoint = args.endpoint
74
75 if input_length < 100 or input_length > 64000:
76 parser.error("Invalid input. Ensure 100 <= input_length <= 64000")
77
78 if endpoint not in VALID_ENDPOINTS:
79 parser.error(f"Invalid endpoint. Valid endpoints are: {', '.join(VALID_ENDPOINTS)}")
80
81 # Generate 50 datapoints with the specified length
82 data = []
83 for _ in range(50):
84 data.append({"input": generate_lorem_ipsum(input_length)})
85
86 # Save to a JSON file
87 file_name = f"generated_data_{input_length}.json"
88 with open(file_name, 'w') as f:
89 json.dump(data, f)
90
91 print(f"Generated 50 datapoints, each with exactly {input_length} characters.")
92 print(f"Data saved to {file_name}")
93
94 # Display a sample of the generated data
95 sample = data[0]
96 print(json.dumps({"input": sample['input'][:100] + "..." if len(sample['input']) > 100 else sample['input']}, indent=2))
97 print(f"Length of sample: {len(sample['input'])} characters")
98
99 api_url = f"{BASE_URL}{endpoint}"
100 headers = {"Authorization": f"Bearer {lakera_guard_api_key}"}
101
102 # Use the generated file
103 inputs = read_inputs(file_name)
104
105 # Randomly select inputs
106 selected_inputs_new = random.sample(inputs, 5) # Select 5 inputs for new connections
107 selected_inputs_reused = random.sample(inputs, 15) # Select 15 inputs for reused connections
108
109 # Calculate the average input size
110 input_size = sum(len(input_value) for input_value in selected_inputs_reused) // len(selected_inputs_reused)
111
112 latencies_new_connection = []
113 latencies_reused_connection = []
114 request_logs = []
115
116 print("Testing with new connections:")
117 for input_value in tqdm(selected_inputs_new, desc="New connections"):
118 latency = measure_latency(requests, api_url, headers, input_value)
119 if latency is not None:
120 latencies_new_connection.append(latency)
121 request_logs.append({
122 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
123 "input_length": len(input_value),
124 "latency": latency,
125 "connection": "new"
126 })
127
128 session = requests.Session()
129
130 print("\nTesting with reused connections:")
131 print("Warming up session...")
132 for input_value in tqdm(selected_inputs_reused, desc="Warm-up"):
133 measure_latency(session, api_url, headers, input_value)
134
135 print("Measuring latencies...")
136 for input_value in tqdm(selected_inputs_reused, desc="Reused connections"):
137 latency = measure_latency(session, api_url, headers, input_value)
138 if latency is not None:
139 latencies_reused_connection.append(latency)
140 request_logs.append({
141 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
142 "input_length": len(input_value),
143 "latency": latency,
144 "connection": "reused"
145 })
146
147 # Calculate the 95th percentile index safely
148 INDEX_95TH_NEW = int(len(latencies_new_connection) * 0.95) - 1 if len(latencies_new_connection) > 1 else None
149 INDEX_95TH_REUSED = int(len(latencies_reused_connection) * 0.95) - 1 if len(latencies_reused_connection) > 1 else None
150
151 avg_new = sum(latencies_new_connection) / len(latencies_new_connection)
152 avg_reused = sum(latencies_reused_connection) / len(latencies_reused_connection)
153 avg_difference_ratio = avg_new / avg_reused
154
155 results = {
156 "endpoint": endpoint,
157 "average_input_size": input_size,
158 "avg_difference_ratio": f"new connections average {avg_difference_ratio:.2f}x greater than reused connections",
159 "stats": {
160 "new_connection": {
161 "min": min(latencies_new_connection),
162 "max": max(latencies_new_connection),
163 "avg": avg_new,
164 "95th": sorted(latencies_new_connection)[INDEX_95TH_NEW] if INDEX_95TH_NEW is not None else None,
165 },
166 "reused_connection": {
167 "min": min(latencies_reused_connection),
168 "max": max(latencies_reused_connection),
169 "avg": avg_reused,
170 "95th": sorted(latencies_reused_connection)[INDEX_95TH_REUSED] if INDEX_95TH_REUSED is not None else None,
171 },
172 },
173 "latencies_new_connection": latencies_new_connection,
174 "latencies_reused_connection": latencies_reused_connection,
175 }
176
177 # Generate filenames
178 summary_filename = f"{endpoint}-summary.json"
179 request_logs_filename = f"{endpoint}-requests.json"
180
181 # Save results to JSON
182 save_results_to_json(results, summary_filename)
183 save_results_to_json(request_logs, request_logs_filename)
184
185 print("\nResults:")
186 print(json.dumps(results, indent=2))
187
188if __name__ == "__main__":
189 main()

Expected SaaS Latency

Latency table for persistent connections.

Input Length (Characters)Latency ms p(95)
1,000<20
10,000<100-200
20,000<200-300
30,000<300-400
50,000>400

Lakera Guard Latency Benchmark (Self-Hosted)

Ensure you’ve created a latency_benchmark.py file with the complete code outlined below. To run the benchmark, we must pass in the following flags:

  • -d: Specify the length of data to generate per input (between 100 and 64,000 characters)
  • -e: API endpoint to test. Valid options are prompt_injection, moderation, pii, unknown_links
  • -u or --base_url: (Optional) Base URL for the API. Default is http://localhost:8000/v1/.

Example Usage

By default, the script selects 20 random prompts from the generated data. The following command will send 20 requests, each 5000 characters in length, to the Prompt Injection endpoint.

$python latency_benchmark.py -d 5000 -e prompt_injection -u 'http://custom.api.url/v1/'

latency_benchmark.py

1import json
2import time
3import requests
4import random
5import os
6import argparse
7from lorem_text import lorem
8from tqdm import tqdm
9
10def get_env_var(var_name):
11 value = os.getenv(var_name)
12 if value is None:
13 raise ValueError(f"Environment variable {var_name} not set")
14 return value
15
16# Ensure Lakera Guard API key is set
17lakera_guard_api_key = None
18try:
19 lakera_guard_api_key = get_env_var('LAKERA_GUARD_API_KEY')
20except ValueError as e:
21 print(f"Error: {e}")
22 print("Please set the Lakera Guard API key environment variable. export LAKERA_GUARD_API_KEY='your_api_key'")
23 exit(1)
24
25VALID_ENDPOINTS = ["prompt_injection", "pii", "moderation", "unknown_links"]
26DEFAULT_BASE_URL = "http://localhost:8000/v1/"
27
28def generate_lorem_ipsum(length):
29 text = ""
30 while len(text) < length:
31 text += lorem.words(1) + " "
32 return text[:length]
33
34def read_inputs(file_name):
35 with open(file_name, 'r') as f:
36 data = json.load(f)
37 if isinstance(data, list) and all(isinstance(item, dict) and 'input' in item for item in data):
38 inputs = [item['input'] for item in data]
39 else:
40 raise ValueError("JSON file must contain a list of dictionaries with an 'input' key.")
41 return inputs
42
43def save_results_to_json(results, filename):
44 with open(filename, 'w') as file:
45 json.dump(results, file, indent=2)
46
47def measure_latency(session_or_requests, url, headers, input_value):
48 payload = {"input": input_value}
49 try:
50 start = time.perf_counter()
51 response = session_or_requests.post(url, json=payload, headers=headers)
52 end = time.perf_counter()
53 latency = end - start
54 response.raise_for_status()
55 return latency, response.status_code
56 except requests.exceptions.RequestException as e:
57 print(f"Request failed: {e}")
58 return None, None
59
60def main():
61 parser = argparse.ArgumentParser(
62 description="Lakera Guard API Latency Measurement Tool",
63 formatter_class=argparse.ArgumentDefaultsHelpFormatter
64 )
65 parser.add_argument('-d', '--data_length', type=int, required=True,
66 help="Number of characters for data generation (between 100 and 64000)")
67 parser.add_argument('-e', '--endpoint', type=str, required=True,
68 help="API endpoint. Valid options are: prompt_injection, pii, moderation, unknown_links")
69 parser.add_argument('-u', '--base_url', type=str, default=DEFAULT_BASE_URL,
70 help="Base URL for the API. Default is http://localhost:8000/v1/")
71
72 args = parser.parse_args()
73
74 input_length = args.data_length
75 endpoint = args.endpoint
76 base_url = args.base_url
77
78 if input_length < 100 or input_length > 64000:
79 parser.error("Invalid input. Ensure 100 <= input_length <= 64000")
80
81 if endpoint not in VALID_ENDPOINTS:
82 parser.error(f"Invalid endpoint. Valid endpoints are: {', '.join(VALID_ENDPOINTS)}")
83
84 # Generate 50 datapoints with the specified length
85 data = []
86 for _ in range(50):
87 data.append({"input": generate_lorem_ipsum(input_length)})
88
89 # Save to a JSON file
90 file_name = f"generated_data_{input_length}.json"
91 with open(file_name, 'w') as f:
92 json.dump(data, f)
93
94 print(f"Generated 50 datapoints, each with exactly {input_length} characters.")
95 print(f"Data saved to {file_name}")
96
97 # Display a sample of the generated data
98 sample = data[0]
99 print(json.dumps({"input": sample['input'][:100] + "..." if len(sample['input']) > 100 else sample['input']}, indent=2))
100 print(f"Length of sample: {len(sample['input'])} characters")
101
102 api_url = f"{base_url}{endpoint}"
103 headers = {"Authorization": f"Bearer {lakera_guard_api_key}"}
104
105 # Use the generated file
106 inputs = read_inputs(file_name)
107
108 # Randomly select inputs
109 selected_inputs_new = random.sample(inputs, 5) # Select 5 inputs for new connections
110 selected_inputs_reused = random.sample(inputs, 15) # Select 15 inputs for reused connections
111
112 # Calculate the average input size
113 input_size = sum(len(input_value) for input_value in selected_inputs_reused) // len(selected_inputs_reused)
114
115 latencies_new_connection = []
116 latencies_reused_connection = []
117 request_logs = []
118
119 print("Testing with new connections:")
120 for input_value in tqdm(selected_inputs_new, desc="New connections"):
121 latency, status_code = measure_latency(requests, api_url, headers, input_value)
122 if latency is not None:
123 latencies_new_connection.append(latency)
124 request_logs.append({
125 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
126 "input_length": len(input_value),
127 "latency": latency,
128 "status_code": status_code,
129 "connection": "new"
130 })
131
132 session = requests.Session()
133
134 print("\nTesting with reused connections:")
135 print("Warming up session...")
136 for input_value in tqdm(selected_inputs_reused, desc="Warm-up"):
137 measure_latency(session, api_url, headers, input_value)
138
139 print("Measuring latencies...")
140 for input_value in tqdm(selected_inputs_reused, desc="Reused connections"):
141 latency, status_code = measure_latency(session, api_url, headers, input_value)
142 if latency is not None:
143 latencies_reused_connection.append(latency)
144 request_logs.append({
145 "input": input_value[:100] + "..." if len(input_value) > 100 else input_value,
146 "input_length": len(input_value),
147 "latency": latency,
148 "status_code": status_code,
149 "connection": "reused"
150 })
151
152 # Calculate the 95th percentile index safely
153 INDEX_95TH_NEW = int(len(latencies_new_connection) * 0.95) - 1 if len(latencies_new_connection) > 1 else None
154 INDEX_95TH_REUSED = int(len(latencies_reused_connection) * 0.95) - 1 if len(latencies_reused_connection) > 1 else None
155
156 avg_new = sum(latencies_new_connection) / len(latencies_new_connection)
157 avg_reused = sum(latencies_reused_connection) / len(latencies_reused_connection)
158 avg_difference_ratio = avg_new / avg_reused
159
160 results = {
161 "endpoint": endpoint,
162 "average_input_size": input_size,
163 "avg_difference_ratio": f"new connections average {avg_difference_ratio:.2f}x greater than reused connections",
164 "stats": {
165 "new_connection": {
166 "min": min(latencies_new_connection),
167 "max": max(latencies_new_connection),
168 "avg": avg_new,
169 "95th": sorted(latencies_new_connection)[INDEX_95TH_NEW] if INDEX_95TH_NEW is not None else None,
170 },
171 "reused_connection": {
172 "min": min(latencies_reused_connection),
173 "max": max(latencies_reused_connection),
174 "avg": avg_reused,
175 "95th": sorted(latencies_reused_connection)[INDEX_95TH_REUSED] if INDEX_95TH_REUSED is not None else None,
176 },
177 },
178 "latencies_new_connection": latencies_new_connection,
179 "latencies_reused_connection": latencies_reused_connection,
180 }
181
182 # Generate filenames
183 summary_filename = f"{endpoint}-summary.json"
184 request_logs_filename = f"{endpoint}-requests.json"
185
186 # Save results to JSON
187 save_results_to_json(results, summary_filename)
188 save_results_to_json(request_logs, request_logs_filename)
189
190 print("\nResults:")
191 print(json.dumps(results, indent=2))
192
193if __name__ == "__main__":
194 main()

Expected Self-Hosted Latency

Latency table for persistent connections.

Input Length (Characters)Latency ms p(95)
1,000<20
10,000<100
20,000<150
30,000<200
50,000>250

Be aware of monthly API requests limit as outlined in your trial agreement.