Evaluation Datasets | Lakera API documentation

The following is a list of public data sets on Hugging Face that can be used for evaluating the accuracy and effectiveness of Lakera Guard.

For more guidance on performing evaluations, please refer to our evaluation guide.

If you’d like to do a formal evaluation of Guard as part of a ‘Proof-of-Value’ please contact us.

Name	Type	#Prompts	Purpose
Salad-Data	Prompt Injection	21,318	Comprehensive categorised dataset containing attack-enhanced prompts with jailbreak attempts across multiple harm categories including illegal drugs, misinformation, fraud, and dangerous content.
ChatGPT-Jailbreak-Prompts	Prompt Injection	79	Collection of jailbreak related prompts for ChatGPT.
Vigil: LLM Jailbreak embeddings	Prompt Injection	104	Curated dataset of prompts to test scanners detecting prompt injections, jailbreaks, and other potentially risky inputs.Contains text-embedding-ada-002 embeddings for all “jailbreak” prompts used by Vigil.
ALERT Adverserial	Prompt Injection	45,731	Categorised dataset of harmful instructions for the ALERT benchmark. Designed for testing content moderation and safety alignment in instruction-following models.
NOETI ToxicQAFinal	Content Moderation	6,866	Categorised dataset of harmful and toxic content for evaluating content moderation.
SQuAD 2.0	Negative	142,192	Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles. This is an all-negative dataset for false-postive evaluation.