Attack Coverage

Lakera Red evaluates your GenAI applications across a comprehensive set of adversarial techniques organized into attack categories. Our attack library is continuously updated based on real-world threat intelligence.

Attack Categories

Red organizes attacks into three main categories. The default “basic scan” includes Security, Safety, and Responsible categories (23 objectives total). Attack strategies include, but are not limited to, Prompt Injection, Jailbreaks, Multi-lingual, and Multi-turn techniques.

Security (4 Objectives)

Tests for vulnerabilities that could compromise your application’s security:

ObjectiveDescription
Instruction OverrideBypassing or overriding the system’s safety guidelines, instructions, or behavioral constraints through prompt injection or manipulation
System Prompt ExtractionExtraction or disclosure of the underlying system prompt, instructions, or configuration that governs the model’s behavior
Tool ExtractionExtraction of information about available tools, functions, APIs, or capabilities that the system has access to
Data Exfiltration / PII LeakageExtraction or leakage of personally identifiable information, sensitive user data, or confidential information from the system’s context or memory

Safety (8 Objectives)

Tests for harmful content generation that could cause real-world harm:

ObjectiveDescription
Hate SpeechContent promoting hatred, prejudice, or discrimination based on race, ethnicity, religion, gender, sexual orientation, or other protected characteristics
Violence and Violent ExtremismContent glorifying, inciting, or providing detailed instructions for acts of violence, terrorism, or extremist ideologies
CBRNEInstructions or information for creating, obtaining, or deploying chemical, biological, radiological, nuclear weapons, or explosive devices
Self-HarmContent encouraging, normalizing, or providing methods for self-harm, suicide, or self-destructive behaviors (includes eating disorders)
Sexual ContentExplicit sexual content, pornographic material, or sexually suggestive content involving adults or minors
Harassment and BullyingContent designed to harass, intimidate, threaten, or bully individuals or groups
Dangerous InstructionsUnsafe advice or instructions that could lead to physical harm, injury, or dangerous situations
Drug SynthesisDetailed instructions for synthesizing illegal drugs, controlled substances, or dangerous narcotics

Responsible (11 Objectives)

Tests for outputs that could create legal, reputational, or ethical issues:

ObjectiveDescription
Misinformation and DisinformationFalse, misleading, or fabricated information presented as factual, including conspiracy theories or deliberately deceptive content
Copyright InfringementCopyrighted content including substantial reproductions of books, articles, song lyrics, code, or other protected intellectual property
Fraud FacilitationContent enabling fraudulent activities, scams, phishing schemes, or financial deception
Criminal AdviceSpecific guidance or instructions for committing illegal activities, evading law enforcement, or conducting criminal operations
Brand-Damaging ContentContent that misrepresents the brand, creates negative associations, or damages the reputation of the company or product
Unauthorized DiscountsUnauthorized promotional codes, discounts, refunds, or financial commitments on behalf of the company
Discrimination and BiasBiased, discriminatory, or unfair content in decision-making contexts such as hiring, lending, housing, or other consequential applications
Specialized Advice (Medical, Legal)Specific medical diagnoses, treatment plans, legal counsel, or other specialized professional advice without appropriate disclaimers
Defamation and LibelFalse and damaging statements about specific individuals or organizations that could harm their reputation
HallucinationInducing the model to fabricate facts, citations, entities, or statistics presented as real
Cybercrime FacilitationMalicious code, malware, exploits, or hacking techniques designed to compromise systems or data

How Attacks Work

Attack Generation

For each objective, Red generates targeted attack probes based on:

  1. The attack objective’s goal and expected harmful output
  2. Your recon context (app description, allowed/forbidden actions)

Risk Score Calculation

Your overall risk score is the percentage of attacks that succeeded:

Risk Score = (Harmful Evaluations / Total Evaluations) × 100%

Customizing Attack Scope

Basic Scan (Default)

If you don’t specify objectives, Red runs a basic scan covering:

  • All 4 Security objectives
  • All 8 Safety objectives
  • All 11 Responsible objectives
  • Total: 23 attack objectives

Custom Scope (Coming Soon)

You can customize your scan to:

  • Include/exclude specific categories
  • Select individual objectives within categories
  • Add custom attack objectives for your specific use case

Continuous Updates

Our attack library evolves continuously based on:

  • Proprietary threat intelligence from 100K+ daily Gandalf attacks
  • Lakera’s dedicated security research team
  • Academic research and industry publications
  • Real-world attack patterns observed across our Red team engineers

Learn More