Lakera Guard Integration Guide | Lakera API documentation

Lakera Guard functions as a control layer around a model, assistant, or agent(s). Whether deployed as a self-hosted container or leveraged via the SaaS service, Lakera Guard fits seamlessly into a wide range of use cases and architectures.

Integrating Lakera Guard

Integrating Lakera Guard is as simple as making an API call for each LLM interaction, passing the inputs (both user and reference documents) and LLM outputs for screening. Lakera Guard will respond flagging whether the interaction contains a threat or policy violation.

Based on the Guard flagging response you can define control flows flexibly in your applications or integrations to prevent and respond to threats in real-time.

For advice on integration best practices and common pitfalls to avoid see the Integration recommendations section.

Designing Control Flows

Lakera Guard gives you flexibility to choose how to respond to flagged interactions. This provides full control over designing workflows based on your desired behavior and user flows.

Lakera Guard provides a boolean response of flagging equals true or false. Control over the guardrails used for screening and the flagging sensitivity are controlled via the policy.

Many Guard users opt to block threats in real-time, preventing the potentially compromised output being returned to the user or application. Some Guard users opt to configure more flexible workflows, enabling users to override flagged interactions or setting up escalations which kick in after a certain number of interactions have been flagged for a user.

When first integrating Lakera Guard, you can choose to use a pure monitoring strategy to begin with. This simply means integrating Lakera Guard without creating any flows to take action based on flagged responses. This approach allows you to monitor Guard’s performance, configure policies correctly, and identify threats and vulnerabilities for post-hoc investigation and response.

Sample use cases

Sample Use Case: GenAI Chat Application

Generative chat applications are a popular enterprise use case for Lakera Guard. First, consider the data flow of a chat system that does not leverage security controls for managing model input and output.

Data flow from the user to the model back to the user. Security controls rely on the model itself.

In this basic implementation, data flows from the user to the model back to the user. Security is dependent on the model’s ability to handle malicious input and control its own output.

This implementation poses several risks, including malicious prompts such as prompt injections or jailbreaks entering the application. It’s also possible for sensitive data, like PII, to enter the model. Depending on compliance requirements, this may pose additional risks. Additionally, there is concern the model may provide the user with undesirable responses, including hate speech or sexual content. Relying on the foundation model developer to address these risks comprehensively is not optimal. Updates to the model can introduce behavioral changes. There’s also potential for creating lock-in conditions which would make using multiple models or switching providers difficult.

Lakera Guard Implementation

Lakera Guard protects against these risks but is abstracted from the model itself. In the generative chat system, we recommend screening the interaction holistically, screening both inputs and output after the LLM has responded and before returning the response to the user. This optimises for latency as there’s only one screening request and provides Lakera with the full context for higher accuracy.

Alternatively, if you have concerns with data leakage to third party LLM providers, you can send the user input to the Lakera Guard API for screening before passing the prompt to the model and then screen the whole interaction before returning the output to the user.

A common policy configuration for chatbots is to screen the user input for prompt attacks, sensitive user PII, and content violations to check for malicious attacks, data leakage and user misbehavior. On the model output, it’s common to screen for content violations, data leakage, and unknown links in the model response to check for model misbehavior, data exfiltration and phishing links. Note that all guardrails are configurable via the policy.

Data flow from the user is passed to Lakera Guard for evaluation. Control flows are designed to send safe input to the model. The model output is sent to Lakera Guard for evaluation prior to returning a response to the.

In the diagram above, the GenAI Chat application is secured with Lakera Guard by making an API call containing the user input and an API call containing the model output. In doing so, a control set has been created to enforce what enters and leaves the model without relying on the model itself.

Sample Use Case: Screening Documents

Screening documents, files or other reference content with Lakera Guard works similarly to user input. Consider a general data flow of a document being passed as context to a model. The first requirement is to handle the document upload and parse it into text. Once parsed, the control flow follows the same structure.

Documents are parsed to text and flow and passed to Lakera Guard for evaluation. Control flows are designed to send safe input to the model. The model output is sent to Lakera Guard for evaluation prior to returning a response to the.

Lakera Guard has a large context window, in line with major models, and does smart chunking internally. When dealing with large documents that exceed model or Lakera context limits, the document can be parsed and chunked, sending smaller parallelized requests to Lakera Guard. Baselining latency helps identify the optimal chunking size and performance trade-off.

Documents are parsed to text and chunked to smaller inputs. Parallelized requests are then passed to Lakera Guard for evaluation. Control flows are designed to send safe input to the model. The model output is sent to Lakera Guard for evaluation prior to returning a response to the.

Sample Use Case: RAG Architecture

GenAI Applications utilising Retrieval Augmented Generation (RAG) and scaled-out knowledge bases can leverage Lakera Guard as well. This helps to extend protection against documents which may contain poisoned or sensitive data, either from the user or malicious actors targeting the user.

The following diagram shows a question-answer RAG generation pattern, but is applicable for other RAG use cases.

RAG Architectures extend the functionality of LLMs by providing access to relevant and specific content.

Lakera Guard Implementation

The Lakera Guard integration for RAG works similar to chat applications but with both the user and document inputs being passed as multiple inputs to Guard and screened within the same single Guard request.

For RAG setups where the reference content is pre-set or relatively static then it’s recommended to screen documents directly during the initial upload to identify poisoned documents ahead of time rather than during the user interaction. This can be done following the document integration approach outlined above.

Despite increased architectural complexity, the implementation pattern remains the same. Contained in the API request call to Lakera Guard is the user input and retrieved context. The input is evaluated in its entirety.

RAG Architectures extend the functionality of LLMs by providing access to relevant and specific content. Despite increased complexity, the AI application can be protected by Lakera Guard with the same implementation pattern.

Sample Use Case: AI Gateway

Lakera Guard integrates seamlessly within an AI Gateway, providing a centralized access point for managing and securing AI services. This integration ensures consistent control enforcement across all AI interactions. Organizations benefit from this setup through improved efficiency, enhanced observability, and streamlined operations.

Lakera Guard fits seamlessly into a larger AI Gateway ecosystem.

Integration recommendations

We recommend following these recommendations in order to avoid common pitfalls:

Set up a project and assign a relevant policy

The Lakera Default Policy has our strictest flagging sensitivity so will flag anything Guard isn’t confident is safe. Use one of the Lakera recommended policies to assess Guard’s accuracy in real deployments.

Pass system prompts separately and correctly

A common attack vector is for users to insert malicious instructions as if they’re system instructions. Therefore in order to prevent Guard flagging your own system instructions make sure they’re passed in a separate message from LLM inputs and outputs with the message role as system.

Screen the original raw content from user inputs and reference documents

A common practice is to add additional system instructions or decorators to LLM inputs. Similar to the point above, in order to avoid false positives either screen the exact original input or remove any system instructions added to LLM inputs before passing to Guard as these are likely to be flagged as prompt attacks.

Screen for threats holistically

Lakera Guard performs best when screening interactions holistically with all appropriate guardrails applied. This means passing both LLM inputs and outputs in screening requests and using representative policies. The more context Guard has, the more accurate its threat detection will be.