- Prerequisites
- Login to the container registry
- Pull the Lakera Guard container
- Export License key as an environment variable
- When a successful license key if found:
- Example error messages
- When no License key is found:
- If the License key is expired:
- When the License key is close to expiration:
- Run the Lakera Guard container
- Container versioning
- Stable builds
- Nightly builds
- Version pinning
- Deployment guides
- Environment variables
- Configuring the request size limit
- Common models and their context windows
- Resources and scaling
- Resource requirements
- Enabling HTTPS support
- Example command
- Testing your self-hosted Lakera Guard
- Example usage
Self-Hosting
Self-hosting Lakera Guard allows organizations to keep user data in their own infrastructure. The Lakera Guard container enables self-hosting on-premises or in a private cloud.
Prerequisites
Before you start, you will need the following:
- the
dockercommand line interface (CLI) - a valid Lakera Guard Enterprise license
- an
ACCESS_TOKENandSECRET_TOKENfor the Lakera Guard container registry - the
REGISTRY_URLfor the container registry - the
CONTAINER_PATHfor the container in the container registry - the
LAKERA_GUARD_LICENSEfor running the Lakera Guard container
The LAKERA_GUARD_LICENSE, ACCESS_TOKEN, SECRET_TOKEN, REGISTRY_URL, and CONTAINER_PATH are provided by Lakera. If you are an Enterprise customer who plans to self-host and haven’t received these credentials, please reach out to support@lakera.ai.
Login to the container registry
Once you’ve received your credentials, you can log in to the Lakera Guard container registry using the docker CLI.
If you’re logging in to the container registry locally and the SECRET_TOKEN isn’t available as an environment variable, use the --password-stdin option to enter the SECRET_TOKEN securely and avoid exposing it in your shell history.
Pull the Lakera Guard container
After logging in to the container registry, you can pull the Lakera Guard container.
Export License key as an environment variable
To run the Lakera Guard container, make sure to have the license key ready and can be exported as an environment variable
The container will only load if the license key is found and not expired.
When a successful license key if found:
Example error messages
When no License key is found:
If the License key is expired:
When the License key is close to expiration:
This is a warning message, and the container will continue to boot.
Run the Lakera Guard container
The Lakera Guard container needs to be bound to port 8000:
Container versioning
The Lakera Guard container follows semantic versioning for its tags. If you need to pin your implementation to a specific version, replace stable with the desired version number.
Semantic version uses a MAJOR.MINOR.PATCH scheme where each version number is updated based on the scope of changes:
MAJOR: incremented when we make incompatible API changes or potentially breaking model changesMINOR: incremented when we add functionality that is backwards compatible or significantly improve model performancePATCH: incremented when we make backwards compatible bug fixes, add minor functionality, or make small improvements in model performance
Stable builds
Stable builds are recommended for most use cases. These are thoroughly tested for compatibility and updated every two weeks.
Nightly builds
If you want to opt-in to bleeding edge, nightly builds you can use the latest tag, which will correspond to the most recently shipped changes. Lakera Guard’s defenses are updated every day.
Nightly builds include the latest improvements to our defenses for emerging attacks and vulnerabilities.
Version pinning
If you need to pin your implementation to a specific version, replace stable with the desired version number.
This is not recommended unless you have to for compliance or have been instructed to pin to a specific version by Lakera’s support team.
Deployment guides
Our team has documented deployment guides for some popular platforms:
If you need assistance deploying Lakera Guard, please reach out to support@lakera.ai.
Environment variables
The container has the following environment variables that can be configured:
-
LAKERA_NO_ANALYTICS: disable Sentry crash reporting by setting this to1By default, the Lakera Guard container reports crashes to Sentry. No request input is included in these crash reports. To prevent any data egress from the container, set the
LAKERA_NO_ANALYTICSenvironment variable to1. -
NUM_WORKERS: optional number of parallel workers to run; set to 1 by default - increase if needed based on the available resources (see resources and scaling) -
MAX_CONTENT_LENGTH: maximum request size in bytes; defaults to 128kb -
MAX_WARMUP_INPUT_SIZE: maximum request size used during model warmups; defaults to 100kb. This configuration reduces the latency of first requests, but increases the container startup duration. -
POLICY_RELOAD_INTERVAL_SECONDS: delay in seconds between policy reloading, must be an integer. Default is 60s. If 0 is given, it means infinity (never reload after initial loading at startup)
Configuring the request size limit
Depending on the resources available to your container and the number of tokens the Large Language Model (LLM) you plan to leverage can handle, you may need to adjust the MAX_CONTENT_LENGTH value.
If you raise the MAX_CONTENT_LENGTH too high, you may encounter situations where users can send extremely long requests that could lead to performance bottlenecks or a Denial of Service (DoS) attack.
Input processing time should increase linearly with the request size, so it’s best to set a reasonable content length limit based on your use case, model provider, and the resources available to your container.
Common models and their context windows
The table below includes a non-exhaustive list of popular models and their context windows for quick reference. Check for the latest context window for your desired model by referring to the model provider’s documentation.
Last Updated: 2024-03-18
* The Guard Platform uses OpenAI’s tiktoken tokenizer to calculate tokens, so the MAX_INPUT_TOKENS for models from other providers might be different from the published value of the model’s context window depending on the tokenization method used by the model provider.
† claude-3-* refers to the entire Claude 3 family of models, which all share a 200,000 token context window.
‡ mistral-* refers to the entire Mistral family of models, including Mixtral, which all share a 32,000 token context window.
Resources and scaling
The Lakera Guard platform requires at least 4 GB of Memory and 2 CPU cores to run smoothly. For increased performance, you can scale up the number of replicas of the Lakera Guard container.
Resource requirements
Enabling HTTPS support
The Lakera Guard container supports HTTPS out of the box. To enable HTTPS, you need to
pass a certificate and its private key to the container. To do this, you can use the
-v flag to mount a volume with the certificate and key files to /etc/certs/, and
set the the following environment variables:
ENABLE_HTTPS: set totrueto enable HTTPSHTTPS_CERTFILE: path to the certificate relative to the host directory mounted to/etc/certsHTTPS_KEYFILE: path to the key file relative to the host directory mounted to/etc/certs
The defaults for these variables are:
ENABLE_HTTPS:falseHTTPS_CERTFILE:fullchain.pemHTTPS_KEYFILE:privkey.pem
Example command
Testing your self-hosted Lakera Guard
Once your container is running, you can replace the https://api.lakera.ai in any of our API examples with the URL and port of your self-hosted Lakera Guard instance and ignore the Authorization header.
For example, if you’re running the container on localhost, you can use http://localhost:8000 as the base URL for your API requests.
Example usage
Here are some examples of using your self-hosted guard endpoint: