OWASP Top 10 for LLM Applications: What Engineering Teams Must Fix in 2026
OWASP LLM Top 10 defines the critical vulnerabilities in LLM apps. Here's what prompt injection, excessive agency, and vector attacks look like in production.
Every team shipping an LLM-powered product in 2026 is facing a new attack surface — one that security frameworks designed for traditional applications were not built to map. The OWASP Top 10 for LLM Applications, updated in late 2024 and ratified as the de facto industry standard by early 2025, defines the 10 vulnerability classes that cause real incidents in production language model systems. Unlike OWASP's original web application list, these risks emerge not from code flaws but from how LLMs process language: the model cannot reliably distinguish instructions from data, authorized from adversarial prompting, or safe from sensitive output. The 2025 edition retired two previous categories and introduced three new ones — System Prompt Leakage, Vector and Embedding Weaknesses, and Unbounded Consumption — based on what was actually failing in production, not what was theoretically concerning.
This guide explains all 10 vulnerabilities with concrete attack examples and the controls that reduce real risk. It is written for engineering teams building chat applications, RAG pipelines, AI agents, and any system where an LLM makes decisions or calls external tools. For the infrastructure-level controls that sit underneath application security — agent identities, tool permissions, and access governance — see our guide to securing AI agents in the enterprise. For the operational picture of running LLMs in production — deployment, versioning, cost controls, and monitoring — our LLMOps production guide covers the layer above.
What Is the OWASP Top 10 for LLM Applications?
The OWASP Top 10 for LLM Applications is a community-driven framework that maps the most critical security vulnerabilities specific to systems built on large language models. First published in 2023 as the gap between AI adoption and AI security practice became undeniable, it was updated to the 2025 edition based on 18 months of real production incidents, attacker research, and enterprise deployment data. Three previous categories were retired — their concerns absorbed into Supply Chain and Excessive Agency — and three new ones were added to cover risks that emerged with agents, multi-model pipelines, and the near-universal adoption of RAG. The list now spans from inference infrastructure to output handling.
- →LLM01:2025 Prompt Injection — the model executes adversarial instructions embedded in user input or external data it processes as part of a legitimate task
- →LLM02:2025 Sensitive Information Disclosure — PII, confidential business context, or memorized training data exposed through model outputs
- →LLM03:2025 Supply Chain — third-party models, plugins, datasets, and infrastructure components that introduce vulnerabilities before your application code runs
- →LLM04:2025 Data and Model Poisoning — compromise of training or fine-tuning pipelines that embeds malicious behavior in the model weights themselves
- →LLM05:2025 Improper Output Handling — LLM outputs passed to downstream systems without validation, enabling SQL injection, XSS, or arbitrary code execution
- →LLM06:2025 Excessive Agency — AI agents with permissions or autonomy beyond what their tasks require, enabling high-impact actions under adversarial control
- →LLM07:2025 System Prompt Leakage — confidential system instructions exposed to adversaries through direct extraction or jailbreak prompting
- →LLM08:2025 Vector and Embedding Weaknesses — attacks against RAG retrieval pipelines, vector stores, and embedding models
- →LLM09:2025 Misinformation — confident, plausible, but factually incorrect outputs that cause harmful downstream decisions
- →LLM10:2025 Unbounded Consumption — resource exhaustion, denial of wallet, and cost attacks against inference infrastructure
LLM01 — Prompt Injection: Why It Stays at the Top in 2026
Prompt injection remains OWASP's number one LLM vulnerability because it is not a software bug with a patch — it is a consequence of how language models work. LLMs blend the system prompt, the user message, and any retrieved or tool-returned context into a unified input representation. The model processes that representation without a hard architectural separation between trusted instructions and untrusted data. Direct injection — a user explicitly attempting to override system behavior — is well-understood and partially mitigated by modern models through safety training. Indirect injection is the operationally dangerous form: an adversarial instruction is embedded in an external document, email, web page, or API response that the LLM reads as part of a legitimate task and then executes as if it were a trusted system instruction.
Anthropic's February 2026 model card dropped direct injection as a standalone benchmark category, noting that every high-impact production incident in the preceding year involved indirect injection — manipulation through external data the model processes, not instructions the user sends directly. For enterprise teams, this means the primary threat model for prompt injection is not your users. It is every external data source your LLM reads: retrieved documents in a RAG pipeline, customer emails processed by an agent, web pages summarized by a tool, and third-party API responses injected into context.
- →Privilege separation in context construction: clearly mark system instructions, user input, and external retrieved data as separate labeled sections. The LLM will not enforce this at the architecture level, but your output monitoring layer can detect when model behavior reflects instructions that should have been treated as untrusted data
- →Output monitoring before tool execution: in agentic systems, treat every proposed tool call as an untrusted output requiring validation before execution. Flag calls requesting elevated permissions, unexpected external destinations, or actions outside the agent's defined task scope
- →Input sanitization at the retrieval boundary: scan document chunks for adversarial instruction patterns before they enter the RAG pipeline. A document containing override instructions should be rejected at ingestion, not after the model has processed it
- →Minimal context exposure: include credentials, PII, and business-sensitive data in the LLM context only when the task strictly requires them — information in context can be exfiltrated through injection exactly as through application code bugs
- →Human confirmation for consequential tool calls: data writes, external API calls, email sends, and financial transactions should require human approval before execution, particularly in the first months of any agentic deployment
LLM02 and LLM03 — Information Disclosure and Supply Chain Risks
Sensitive information disclosure covers unauthorized exposure of data through the model: PII surfaced from training data through adversarial prompting, confidential business logic extracted from the system prompt, customer data from previous sessions inadvertently retained in context, or proprietary information retrieved from enterprise documents and repeated verbatim. The 2025 update expanded this category to explicitly include training data extraction attacks — techniques that coax a model into reproducing memorized training corpus content. For enterprise teams who fine-tune models on internal data, customer records, internal communications, and proprietary documentation used in fine-tuning can potentially be recovered by a sufficiently motivated adversary. Dataset governance is a security control, not just a data quality practice.
LLM supply chain risk is distinct from traditional software supply chain risk because the attack surface includes the model weights themselves. A third-party base model, fine-tuning dataset, vector database, or inference-time plugin is a trust boundary where compromise propagates into your application before your application code runs. A poisoned dataset used in pre-training can embed triggered behavior into millions of downstream applications simultaneously. The model integrity and provenance controls that address this — and how they map to SBOM and SLSA frameworks for AI — are covered in our software supply chain security guide.
- →Treat your system prompt as sensitive but not secret: design it assuming a motivated adversary will attempt extraction, and include nothing that causes material damage if disclosed — no API keys, internal service URLs, or regulatory classification labels
- →Implement PII filtering at the output layer: scan all LLM responses for personal data patterns before returning them to users or downstream systems, regardless of whether the model should have had access to that data
- →Maintain a model Bill of Materials: track which base model, fine-tuning dataset version, and checkpoint your application runs on, with cryptographic integrity verification of model weights where the provider supports it
- →Audit third-party plugins and MCP server connections before production: each plugin with access to your data store is a supply chain dependency with the same access that store grants to your own application code
LLM04 and LLM05 — Data Poisoning and Improper Output Handling
Data and model poisoning targets the training pipeline rather than the runtime application. An attacker who can influence a training or fine-tuning dataset can embed triggered behavior — a backdoor that causes the model to behave normally under typical conditions but produce adversary-controlled outputs when a specific trigger phrase appears in the input. In 2026, the most practically relevant poisoning vector for enterprise teams is the fine-tuning dataset: customer support conversations, labeled internal documentation, or annotated examples ingested without adversarial review. Several production fine-tuned models have exhibited unexpected behavioral patterns traceable to mislabeled or manipulated training examples. The mitigation starts before fine-tuning: if data should not be in the model, do not include it.
Improper output handling is a distinct problem with a straightforward fix that is consistently underimplemented. An LLM that generates SQL queries, shell commands, HTML, or code can trigger injection attacks, arbitrary code execution, or XSS in downstream systems that accept its output without validation — not because the model was compromised but because its output was treated as trusted input to a system that it is not. Treat LLM output exactly as you would user-supplied input to a traditional application: validate the structure, escape for the target execution context, and reject outputs that do not match the expected schema before passing them to any system that will act on them.
- →Adversarially review fine-tuning datasets before use: apply automated scanning with a separate LLM to flag anomalous examples and manually spot-check high-risk subsets. A poisoned 0.1% of a fine-tuning dataset is sufficient for reliable behavioral backdoor injection
- →Never pass LLM output directly to SQL executors, shell interpreters, or code runners: validate structure, escape for the target context, and reject outputs that do not match the expected schema
- →Use constrained decoding or post-generation schema validation for structured tasks: JSON generation, form filling, and code generation all benefit from output validation that catches both security risks and format failures before downstream consumption
LLM06 — Excessive Agency: The Highest-Stakes Risk for Agentic Systems
Excessive agency is the risk that defines the security posture of agentic AI deployments. It occurs when an LLM-based agent has been granted more permissions than its task requires, more tool access than its function needs, or more autonomy than is safe given the impact of actions it can take. The combination of prompt injection and excessive agency is the attack chain behind the most serious real-world agentic incidents: an injected instruction reaches an agent with write access to production systems, delete permissions on data stores, or the ability to send outbound communications on behalf of the organization — and the agent executes it with valid credentials. The attack chain is not theoretical; it has caused real data loss, unauthorized external communications, and compliance failures in 2025 and 2026.
- →Minimum viable permissions per task: scope each agent's credentials to the exact resources its current task requires — not everything it might ever need. An agent that reads documents for summarization needs read access to those documents, not write access and not access to adjacent data stores its credentials happen to reach
- →Confirmation gates for irreversible actions: data deletion, outbound emails, financial transactions, and public API calls require explicit human approval before execution — not just LLM confidence that the action is appropriate
- →Separate read and write identities: agents that switch between reading and writing should hold two distinct credentials, acquiring write credentials only for the duration of the specific write operation and releasing them immediately after
- →Maximum iteration limits on all agent loops: every agentic loop must have a hard ceiling on iterations and a total-tokens-per-task budget enforced in code, not in prompts. The most common excessive agency incident is not a security attack but a retry bug that runs without bound
- →Prefer reversible staging: design agentic workflows to stage outputs for review before committing — draft folders for generated emails, staging environments for infrastructure changes, preview steps for document modifications. Reversibility is the last line of defense when other controls are bypassed
LLM07 and LLM08 — System Prompt Leakage and Vector Embedding Weaknesses
System prompt leakage occurs when confidential instructions in the LLM system prompt are extracted by an adversary. The risk is significant because production system prompts routinely contain business logic and decision rules, internal service endpoint URLs, API key naming conventions, compliance classification labels, persona definitions with proprietary methodology, and explicit guidance that reveals which input patterns the system is designed to reject. Treating the system prompt as secure through obscurity is insufficient: motivated adversaries have demonstrated consistent extraction through jailbreak prompting, and newer extraction techniques operate without jailbreaking. The correct posture is to design system prompts assuming they will eventually be visible, and to include nothing that causes material damage if disclosed.
Vector and embedding weaknesses are new to the 2025 OWASP list and directly target the RAG architecture now underlying the majority of enterprise LLM deployments. The category covers three distinct attacks: injecting adversarial documents into the vector store that are retrieved for sensitive queries and manipulate model outputs; exploiting multi-tenant isolation gaps to cross tenant boundaries in shared RAG systems; and using the embedding API to reconstruct source documents from their vector representations. As RAG becomes the standard integration pattern for enterprise AI, this attack class is moving from theoretical to active. The retrieval architecture patterns that resist embedding attacks are covered in our enterprise RAG architecture guide.
- →Remove secrets from system prompts: API keys, internal URLs, and sensitive configuration belong in environment variables and secret managers, retrieved at application startup and injected at the infrastructure layer, never passed through the LLM context
- →Add extraction-resistance instructions but do not rely on them alone: instructions to never reveal the system prompt are partially effective and should be complemented by output monitoring that flags responses matching the structure or phrasing of your system prompt
- →Enforce namespace isolation in vector stores: each tenant's documents must live in isolated namespaces with access control enforced at the retrieval layer, not just the application layer. Verify this in your vector database configuration — several major providers default to no namespace isolation
- →Sanitize documents before embedding: documents entering the RAG vector store should be scanned for adversarial instruction patterns, injected instructions hidden in document metadata or footers, and low-contrast text targeting automated processing — all must be caught before the document reaches the index
- →Treat the embeddings API as privileged: access to the embedding endpoint allows source document reconstruction. Restrict it to authorized internal services with the same access controls applied to a database read API
LLM09 and LLM10 — Misinformation and Unbounded Consumption
Misinformation (LLM09) addresses the risk of an LLM producing confident, plausible, and factually incorrect outputs that drive harmful downstream decisions. This category was elevated in the 2025 list because enterprise LLM applications are increasingly used for consequential tasks — legal document review, financial analysis, medical triage, technical architecture decisions — where a single authoritative-sounding hallucination causes significant harm. The engineering response is not exclusively model quality improvement; it is system design. Applications that pass LLM output directly to users without factual grounding, citation requirements, or human review gates are architecturally designed to fail at this risk, regardless of which model runs inside them.
Unbounded consumption (LLM10) covers resource exhaustion and cost attacks against inference infrastructure. An adversary who triggers expensive inference — through long-context prompts engineered to force maximum generation length, or high-frequency bursts — can cause significant cost spikes or service unavailability without accessing any data. In practice, the most common production incident in this category is not an external attack but an internal bug: an agent retry loop that enters unbounded recursion when tool calls fail, generating thousands of inference requests and a significant cloud bill before anyone is alerted. Token budget enforcement in code — not in prompts — is the single highest-return control for this risk.
- →Ground factual outputs in retrieved sources with citation requirements: for any application where factual accuracy is consequential, require the model to cite the specific document passage supporting each claim. Uncited assertions should be flagged for human review before reaching users
- →Route low-confidence outputs to human review: sampling-based uncertainty estimation and provider confidence signals identify responses the model is uncertain about before they reach users who may not apply appropriate scrutiny
- →Enforce per-request and per-session token budgets in code, not prompts: hard limits on input and output tokens prevent cost attacks and catch runaway agent loops before they become operational incidents
- →Implement semantic caching: near-identical queries can serve cached responses rather than triggering fresh inference. Caching reduces cost, improves latency under load, and reduces the surface for denial-of-wallet attacks
How to Prioritize: A Risk-Based Framework by Application Type
The OWASP LLM Top 10 is a map of the vulnerability space, not a sequential remediation backlog. Prioritize by your architecture first — the risks most likely to cause real damage are determined by what your application does, not by which number is lowest on the list.
- →Chat-only applications with no external tool access: prioritize LLM01 (direct injection and jailbreaking), LLM02 (sensitive information disclosure through system prompt extraction or training data recall), LLM07 (system prompt leakage), and LLM09 (misinformation in high-stakes decision support). The primary threat is behavioral manipulation and information leakage
- →RAG applications over enterprise documents: add LLM08 (vector embedding weaknesses and multi-tenant isolation gaps) and LLM04 (document corpus poisoning through malicious content ingestion). In multi-tenant deployments, tenant isolation in the vector store is the single highest-priority control
- →Agentic applications with tool access: shift weight to LLM06 (excessive agency and over-privileged tool access), LLM01 indirect injection (adversarial documents triggering unintended tool calls), LLM03 (supply chain risk from connected tools and MCP servers), and LLM10 (unbounded consumption from unguarded agent loops). These four categories cover the attack chains causing the most severe agentic incidents in production
Frequently Asked Questions
What is prompt injection in LLM applications?
Prompt injection is a vulnerability where adversarial instructions in inputs the LLM processes cause it to deviate from intended behavior. Direct injection involves the user explicitly attempting to override system instructions in their own message. Indirect injection — the more dangerous production form — embeds adversarial instructions in external content the model reads during a legitimate task: retrieved documents, emails, web pages, or API responses. There is no complete technical fix; effective defense requires layered controls including output monitoring, minimal context exposure, privilege separation in context construction, and human approval gates for high-impact tool calls.
What is excessive agency in AI agents?
Excessive agency occurs when an AI agent has more permissions, tool access, or autonomy than its specific task requires. An agent granted write access to production systems for a read-only summarization task has excessive agency — if manipulated through prompt injection, it can take harmful actions across everything those credentials permit. Mitigations include least-privilege permissions scoped per task, confirmation gates for irreversible actions, separated read and write credentials, and maximum iteration limits on agent loops enforced in application code.
What are vector and embedding weaknesses in RAG systems?
Vector and embedding weaknesses are vulnerabilities in RAG retrieval pipelines where attackers compromise the vector store or embedding layer rather than the LLM itself. Attack vectors include injecting adversarial documents that are retrieved for sensitive queries, exploiting multi-tenant isolation gaps to cross tenant boundaries, and using the embeddings API to reconstruct source documents from their vector representations. Defenses include strict namespace isolation in the vector store, document sanitization before embedding, and access control enforced at the retrieval layer rather than only at the application layer.
How do you prevent sensitive information disclosure in LLM applications?
Prevention starts at data governance: do not include PII, customer records, or confidential data in training or fine-tuning datasets unless necessary and explicitly governed. At runtime, implement PII filtering on LLM outputs before returning them to users — scan responses for personal data patterns the model may surface from context or training data. Keep secrets out of system prompts: API keys and internal service configuration belong in environment variables and secret managers, not in text the model processes and can be induced to repeat.
What is the difference between direct and indirect prompt injection?
Direct prompt injection is an explicit attempt by the user to override system instructions through their own input — typically phrased to ignore or override previous instructions. Indirect prompt injection embeds adversarial instructions in external data the LLM reads during a legitimate task: a retrieved document, a customer email processed by an agent, or a web page summarized by a tool. Indirect injection is the operationally significant form: the attacker does not need to interact with the application directly, only to control data that eventually enters the LLM's context. Anthropic's 2026 model card identified indirect injection as the root cause of all high-impact production compromises from the preceding year.
How Belsoft Helps Secure Your LLM Applications
Building LLM applications securely requires controls at every layer: data governance before training, retrieval architecture hardening for RAG, agent permission scoping, output validation before downstream use, and operational monitoring that detects behavioral drift after deployment. Belsoft builds AI features with security as a design constraint from the first architecture review. Our AI & Automation service delivers production-ready LLM integrations with prompt guardrails, output validation, and agent permission scoping built into the implementation from day one. Our Security & Scalability service includes OWASP LLM Top 10 threat modeling applied to your specific architecture and deployment environment.
If your team is building an LLM-powered product and wants an independent assessment against the OWASP LLM Top 10, or is designing an agentic system that requires security architecture from the ground up, book a technical call with our engineering team. We work with CTOs and engineering leaders to identify which of the 10 risks are most critical for their specific application type and implement the controls that address them without slowing down delivery.
“The OWASP LLM Top 10 is not a checklist to complete at launch — it is a map of the threat surface your LLM application lives inside permanently. Security is the rate at which you reduce that surface while continuing to ship.”
Written by
Belsoft Team
More from the blog
Ready to build?
Let's talk about your project.
30 minutes. No pitch. We map your requirements and tell you honestly what it will take.
Book a Strategy Call