Evaluation Datasets

The following is a list of public data sets on Hugging Face that can be used for evaluating the accuracy and effectiveness of Lakera Guard.

For more guidance on performing evaluations, please refer to our evaluation guide.

If you’d like to do a formal evaluation of Guard as part of a ‘Proof-of-Value’ please contact us.

NameType#PromptsPurpose
Salad-DataPrompt Injection21,318Comprehensive categorised dataset containing attack-enhanced prompts with jailbreak attempts across multiple harm categories including illegal drugs, misinformation, fraud, and dangerous content.
ChatGPT-Jailbreak-PromptsPrompt Injection79Collection of jailbreak related prompts for ChatGPT.
Vigil: LLM Jailbreak embeddingsPrompt Injection104Curated dataset of prompts to test scanners detecting prompt injections, jailbreaks, and other potentially risky inputs.Contains text-embedding-ada-002 embeddings for all “jailbreak” prompts used by Vigil.
ALERT AdverserialPrompt Injection45,731Categorised dataset of harmful instructions for the ALERT benchmark. Designed for testing content moderation and safety alignment in instruction-following models.
NOETI ToxicQAFinalContent Moderation6,866Categorised dataset of harmful and toxic content for evaluating content moderation.
SQuAD 2.0Negative142,192Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles. This is an all-negative dataset for false-postive evaluation.