Data Leakage Prevention | Lakera API documentation

Lakera Guard can prevent data leakage by screening LLM inputs and outputs for personally identifiable information, system prompts, trigger words, or custom entity types. It can either block the interaction or mask sensitive information. Additionally, Lakera Guard can stop end-user PII from being sent to third-party LLM providers.

Personally Identifiable Information (PII) is any private data that could lead to the identification of an individual.

Organizations that handle PII must safeguard it to prevent unauthorized access or disclosure. Laws like the General Data Protection Regulation (GDPR) in the European Union and the Gramm-Leach-Bliley Act (GLBA) and the Health Insurance Portability and Accountability Act (HIPAA) in the United States impose strict guidelines on the handling and protection of PII and strict penalties for non-compliance.

PII can show up in applications powered by Large Language Models (LLMs) in a variety of ways, including:

A user could enter their own PII, or the PII of another person
An application that uses retrieval augmented generation (RAG) could retrieve content from a document that unknowingly contains PII and share it with another end user
Your data policy may not include sharing customer PII with a third-party LLM provider powering your GenAI application

Lakera’s AI models will not be trained on any PII. All detected PII is masked before being logged in Lakera systems. See here for more information.

Custom data leakage guardrails

Custom data leakage guardrails can be setup within Lkaera Guard to detect sensitive data by specifying the data or document type, the content type, or through specific pattern or key word matching.

System prompt detection

In order to prevent extraction of system prompts, particular for further exploitation by attackers, custom data leakage guardrails can be setup within Lkaera Guard to detect your system prompts within LLM outputs.

Lakera PII Detectors

Lakera Guard can be used to identify the following entities:

Full names
United States mailing addresses
United States Phone numbers
Email addresses
Internet Protocol (IP) addresses
Credit card numbers
International Bank Account Numbers (IBANs)
United States Social Security Numbers (SSNs)

Name

The name detector identifies full names of individuals from any cultural background, including names with a middle letter or a full middle name. It is resilient to common typos and punctuation errors.

Examples

Robert Neruda
John C Smith
Rafael Mora
Francis Shawn Key
Yukihiro Ozawa
Aishwarya Rajan
Zainab Malik
Goerge Mller (typo)

Counterexamples

It does not flag single names:

Louise
Ahmed
Maria

It does not flag common test names:

John Doe
Jane Doe

Mailing Addresses

The mailing address detector identifies US mailing addresses that include a street address and possibly one or more of city, state, and zip code. It supports abbreviations for states and common street suffixes. It is resilient to common typos and punctuation errors.

Examples

402 Johnson Street Ozaukee County Port Washington 53074 WI
402 Johnson Street Ozaukee County
1229 COGGIN AVE WASHINGTON CHIPLEY
1990-A Gildersleeve Ave, Bronx, NY 12345
1501 Skyland Blvd E Tuscaloosa AL 35405
777 Brockton Avenue Abington
1000 Highland Colony Pkwy, Ridgeland, MS 39157

Counterexamples

It does not flag non-US postal addresses:

Bahnofstrasse 23, 8001, Zurich, Switzerland
1-1-1 Marunouchi, Chiyoda-ku, Tokyo, Japan

It does not flag names of cities or states, as these are not considered identifying information:

New York
Austin, Texas

Phone numbers

The phone number detector identifies phone numbers that follow the standard US format, with or without the area code. In order to reduce the occurence of false positives, only a the standard US format is recognized.

Examples

(145) 123-1853
+1 (145) 123-1853 (area code is ignored)
787-124-5123
(145)123-1853 (ignoring spaces is allowed)

Counterexamples

Deviations from the standard format are deliberately not recognized.

(145) 123 1853 (second dash is required)
+1 (145) 123-1853 (area code is ignored)
787-124-515
787-124-51588 (trailing numbers are not allowed)
+41 796548327 (non-US numbers are not recognized)

Email addresses

The email address detector identifies email addresses that follow a standard format, including the @ symbol and a domain with a top-level domain (TLD) identifier. It supports periods, underscores, plus signs, and dashes in the local part of the address, accounts for subdomains, and allows for [DOT] and [AT] to be used in place of . and @.

Note that the confidence threshold level of the email address detector cannot be fine-tuned.

Examples

abc@lakera.ai
abc@lakera [DOT] ai
abc [AT] lakera [DOT] ai
abc@platform.lakera.ai
abc+spam@platform.lakera.ai

Counterexamples

The detector does not identify invalid email addresses or those that use certain subsets of characters like emoji domains and emoji email addresses:

john@google
john@@gmail.com
👋@💌.kz

IP addresses

The IP address detector identifies IPv4 and IPv6 addresses that follow a standard format with dot separators (.) for IPv4 and colon separators (:) for IPv6 addresses. Only public, non-multicast IP addresses are detected. The detector does not report common DNS addresses like 8.8.8.8 (Google’s public DNS) or reserved IP addresses as PII.

Note that the confidence threshold level of the IP address detector cannot be fine-tuned.

Examples

109.202.218.238
2a02:168:6385:0:606d:1692:689f:1049
2.168.0.1

Counterexamples

The detector should not identify invalid IP addresses or those that contain typos:

10.920a.218.238
257.168.0.1
10.0.0.1
::1
8.8.8.8
127.0.0.1

Credit card numbers

The credit card detector identifies credit card numbers without spaces and those formatted in the standard 16-digit format, American Express (15-digit) format, 19-digit format, and Diners Club (14-digit) format, separated by whitespace characters or dashes. Credit card numbers are validated using the Luhn algorithm to ensure they are valid card numbers before being flagged as PII.

Examples

4242424242424242
5200 8282 8282 8210
3782 822463 10005
3622 720627 1667
4111-1111-1111-1111

Counterexamples

The detector cannot identify credit card numbers that use a non-standard format, include punctuation or typos, are comprised of zeroes only, or are not valid card numbers according to the Luhn algorithm:

411 1 111 1 1111 11 11
411-1---111-1-1111-11-11
41 11 11111111 11 a 11
4111-1111-1111-1112
0000 0000 0000 0000

IBANs

The International Bank Account Number (IBAN) detector identifies valid IBAN numbers in the standard format with spaces as separators (AA BB BBBB BBBB BBBB BBBB BBBB) or no separators (AABBBBBBBBBBBBBBBBBBBBB).

Note that the confidence threshold level of the IBAN detector cannot be fine-tuned.

Examples

CH 9300762011623852957
CH93 0076 2011 6238 5295 7
DE89 3704 0044 0532 0130 00
FR76 3000 6000 0112 3456 7890 189
IT60 X054 2811 1010 0000 0123 456
ES91 2100 0418 4502 0005 1332

Counterexamples

The detector does not identify IBAN numbers that use a non-standard format, include punctuation or typos, or are invalid IBAN numbers:

CH 9300762011623852951
C H 93007620116238529 57
GB29 NWBK 6016 1331 9268 19A
DE89 3704 0044 0532 0130 0
FR76 3000 6000 0112 3456 7890 1891
IT60 X054 2811 1010 0000 0123 4567

The US Social Security Number (SSN) detector identifies valid SSN numbers in the standard format with dashes as separators (AAA-GG-SSSS), spaces as separators (AAA GG SSSS), or a combination of the two.

Note that the confidence threshold level of the US social security number detector cannot be fine-tuned.

Examples

778-62-8144
030 72 7381
003 06-8815
003-06 8815

Counterexamples

The detector cannot identify SSN numbers that use a non-standard format, include punctuation or typos, or are invalid SSN numbers:

6-4327-4363
241532634
45.356-5678
64-a27-4363
999-45-6789
666-45-6789
000-62-8144
778-00-8144
778-62-0000

Resources

Guides

To help you learn more about integrating application data and protecting users’ PII, we’ve created some guides.

The ELI5 Guide to Retrieval Augmented Generation

Other Resources

If you’re still looking for more:

Read what our CEO, David Haber, had to say about the EU’s AI Act in Fortune
Learn more about how LLM training data memorization could put PII in training data at risk
Learn more about sensitive data leakage in LLMs

Indirect Prompt Injection

Learn more about indirect prompt injection from MITRE ATLAS

Slack AI Vulnerability

Learn how Slack’s AI can be indirectly prompt injected to leak data

Custom data leakage guardrails

System prompt detection

Lakera PII Detectors

Name

Examples

Counterexamples

Mailing Addresses

Examples

Counterexamples

Phone numbers

Examples

Counterexamples

Email addresses

Examples

Counterexamples

IP addresses

Examples

Counterexamples

Credit card numbers

Examples

Counterexamples

IBANs

Examples

Counterexamples

US Social Security numbers

Examples

Counterexamples

Resources

Guides

Other Resources