Attack Coverage | Lakera API documentation

Lakera Red evaluates your GenAI applications across a comprehensive set of adversarial techniques organized into attack categories. Our attack library is continuously updated based on real-world threat intelligence.

Attack Categories

Red organizes attacks into three main categories. The default “basic scan” includes Security, Safety, and Responsible categories (23 objectives total). Attack strategies include, but are not limited to, Prompt Injection, Jailbreaks, Multi-lingual, and Multi-turn techniques.

Security (4 Objectives)

Tests for vulnerabilities that could compromise your application’s security:

Objective	Description
Instruction Override	Bypassing or overriding the system’s safety guidelines, instructions, or behavioral constraints through prompt injection or manipulation
System Prompt Extraction	Extraction or disclosure of the underlying system prompt, instructions, or configuration that governs the model’s behavior
Tool Extraction	Extraction of information about available tools, functions, APIs, or capabilities that the system has access to
Data Exfiltration / PII Leakage	Extraction or leakage of personally identifiable information, sensitive user data, or confidential information from the system’s context or memory

Safety (8 Objectives)

Tests for harmful content generation that could cause real-world harm:

Objective	Description
Hate Speech	Content promoting hatred, prejudice, or discrimination based on race, ethnicity, religion, gender, sexual orientation, or other protected characteristics
Violence and Violent Extremism	Content glorifying, inciting, or providing detailed instructions for acts of violence, terrorism, or extremist ideologies
CBRNE	Instructions or information for creating, obtaining, or deploying chemical, biological, radiological, nuclear weapons, or explosive devices
Self-Harm	Content encouraging, normalizing, or providing methods for self-harm, suicide, or self-destructive behaviors (includes eating disorders)
Sexual Content	Explicit sexual content, pornographic material, or sexually suggestive content involving adults or minors
Harassment and Bullying	Content designed to harass, intimidate, threaten, or bully individuals or groups
Dangerous Instructions	Unsafe advice or instructions that could lead to physical harm, injury, or dangerous situations
Drug Synthesis	Detailed instructions for synthesizing illegal drugs, controlled substances, or dangerous narcotics

Responsible (11 Objectives)

Tests for outputs that could create legal, reputational, or ethical issues:

Objective	Description
Misinformation and Disinformation	False, misleading, or fabricated information presented as factual, including conspiracy theories or deliberately deceptive content
Copyright Infringement	Copyrighted content including substantial reproductions of books, articles, song lyrics, code, or other protected intellectual property
Fraud Facilitation	Content enabling fraudulent activities, scams, phishing schemes, or financial deception
Criminal Advice	Specific guidance or instructions for committing illegal activities, evading law enforcement, or conducting criminal operations
Brand-Damaging Content	Content that misrepresents the brand, creates negative associations, or damages the reputation of the company or product
Unauthorized Discounts	Unauthorized promotional codes, discounts, refunds, or financial commitments on behalf of the company
Discrimination and Bias	Biased, discriminatory, or unfair content in decision-making contexts such as hiring, lending, housing, or other consequential applications
Specialized Advice (Medical, Legal)	Specific medical diagnoses, treatment plans, legal counsel, or other specialized professional advice without appropriate disclaimers
Defamation and Libel	False and damaging statements about specific individuals or organizations that could harm their reputation
Hallucination	Inducing the model to fabricate facts, citations, entities, or statistics presented as real
Cybercrime Facilitation	Malicious code, malware, exploits, or hacking techniques designed to compromise systems or data

How Attacks Work

Attack Generation

For each objective, Red generates targeted attack probes based on:

The attack objective’s goal and expected harmful output
Your recon context (app description, allowed/forbidden actions)

Risk Score Calculation

Your overall risk score is the percentage of attacks that succeeded:

Risk Score = (Harmful Evaluations / Total Evaluations) × 100%

Customizing Attack Scope

Basic Scan (Default)

If you don’t specify objectives, Red runs a basic scan covering:

All 4 Security objectives
All 8 Safety objectives
All 11 Responsible objectives
Total: 23 attack objectives

Custom Scope (Coming Soon)

You can customize your scan to:

Include/exclude specific categories
Select individual objectives within categories
Add custom attack objectives for your specific use case

Continuous Updates

Our attack library evolves continuously based on:

Proprietary threat intelligence from 100K+ daily Gandalf attacks
Lakera’s dedicated security research team
Academic research and industry publications
Real-world attack patterns observed across our Red team engineers

Learn More

Understand foundation model vulnerabilities with the AI Model Risk Index
Read our AI Red Teaming Guide
Explore the LLM Security Playbook