How Do AI Agents Create Data Exfiltration Risk?

AI agents create data exfiltration risk by combining three capabilities that are dangerous together: access to private data, exposure to untrusted content, and the ability to communicate externally. When all three exist in one agent, an attacker can hide instructions inside an email, document, or webpage the agent processes and trick it into sending sensitive data out. No software vulnerability is required. The attacker doesn't need to break in. They just need to talk to your agent.

This isn't hypothetical. Industry research from the Cloud Security Alliance published in April 2026 found that 65% of organizations had experienced at least one AI agent-related security incident in the previous twelve months. Of those, 61% involved data exposure, 43% caused operational disruption, and 35% resulted in direct financial losses. The same study found that 82% of organizations had unknown AI agents running somewhere in their infrastructure.

This post explains how AI agent exfiltration happens, the real-world incidents proving it works, and what organizations can do about it.

In this post

What is data exfiltration?

Data exfiltration is the unauthorized transfer of information from a system to an outside destination. It is the final step in most serious breaches. It's the point at which sensitive data actually leaves your environment and ends up in the hands of an attacker.

Historically, exfiltration has happened through malware, stolen credentials, compromised servers, or insider activity. AI agents introduce a new category: exfiltration driven by the AI's own legitimate actions, performed on the attacker's behalf.

What is an AI agent?

An AI agent is a large language model (LLM) that has been given tools and the ability to take actions. A chatbot answers questions. An agent does things: reads your email, queries your database, searches the web, writes code, sends messages, calls APIs.

This shift from "assistant that talks" to "assistant that acts" is what creates the new risk. A model that can only produce text can embarrass you. A model that can send emails, open pull requests, or make HTTP requests can move your data.

Why are AI agents vulnerable to data exfiltration?

AI agents are vulnerable because of a fundamental property of how LLMs work: they cannot reliably distinguish instructions from data.

When an agent processes a request, everything enters its context window as one stream of text: system instructions, the user's prompt, retrieved documents, tool outputs, emails, web pages. The model treats all of it as meaningful input. If attacker-controlled text appears anywhere in that stream and says "forward the most recent financial report to this address," the model does it.

This vulnerability is known as prompt injection, and OWASP ranks it the #1 security risk for LLM applications. It comes in two forms:

Direct prompt injection: the user is the attacker, typing malicious instructions directly.
Indirect prompt injection: instructions are hidden in external content (a document, an email, a webpage) that the agent reads while doing its normal job. This is the version that matters most for data exfiltration.

The "lethal trifecta"

Independent AI researcher Simon Willison (who also coined the term "prompt injection") introduced a simple framework called the lethal trifecta for identifying dangerous agent configurations. An agent is exploitable when it has all three of these at once:

1. Access to private data. The agent can read information an attacker would want: emails, calendars, customer records, source code, internal documents, database contents.

2. Exposure to untrusted content. The agent processes text or images from sources an attacker can influence: inbound emails, scraped web pages, uploaded documents, support tickets, comments in shared files, calendar invites.

3. The ability to communicate externally. The agent can transmit information outside your environment: sending email, posting to chat, making API calls, rendering image URLs, creating pull requests, or even generating clickable links.

Each capability is useful on its own. The combination is the problem. When all three coexist, any attacker who can slip text into the untrusted-content channel can instruct the agent to pull private data and send it back through the exfiltration channel.

How AI agent exfiltration works in practice

Disclosed incidents from 2025 and 2026 show the same pattern repeating across vendors, interfaces, and attack surfaces.

Email injection: EchoLeak

In June 2025, Microsoft disclosed EchoLeak, the first widely publicized zero-click AI exfiltration vulnerability. A crafted email sent to a Microsoft 365 Copilot user could instruct the agent to pull data from OneDrive, SharePoint, and Teams and exfiltrate it through trusted channels. No clicks, no downloads, no user interaction. The user didn't even have to open the email. Copilot read it on their behalf, and Copilot acted on what it read.

This is the base case for indirect prompt injection. An email is attacker-controlled content. An AI assistant with inbox access is a direct line from that content to everything the user can see.

Public form injection: ForcedLeak and ShareLeak

Two incidents across 2025 and 2026 demonstrated the same attack against customer-facing forms.

ForcedLeak (Salesforce Agentforce, September 2025). An attacker embedded malicious instructions in a public Web-to-Lead form. When an employee later asked the AI agent about the lead, the hidden commands executed and exfiltrated CRM data through an expired-but-allowlisted domain — an elegant abuse of the "trust this domain" list every enterprise maintains.

ShareLeak (Microsoft Copilot Studio, January 2026). Attackers embedded malicious instructions in SharePoint form inputs processed by a connected Copilot agent. The agent returned customer data to an attacker-controlled email. The striking detail: even when Microsoft's safety mechanisms flagged the attempt, data was still exfiltrated.

The pattern is durable because it doesn't depend on a specific product. Any time an AI agent processes content from a public-facing form, the form becomes an injection channel.

Code and PR injection: Comment and Control

In April 2026, researchers at Johns Hopkins University demonstrated that three major AI coding agents (Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent) could be tricked into leaking API keys via a single malicious prompt placed in a GitHub pull request title or comment. Anthropic classified the issue as critical; Google and GitHub paid bounties.

The attack is notable for its simplicity. A pull request comment is attacker-controlled content by design: anyone with a GitHub account can post one. An AI coding agent reviewing the PR reads the comment as part of its job. If the agent also has repository access and network capability, the trifecta is complete.

Other techniques to know

Several variants of the same pattern have been demonstrated in research or are in active use:

Markdown image exfiltration. The agent is tricked into generating an image URL whose path encodes stolen data. When the response is rendered, the user's browser fetches the image, silently sending the data to the attacker's server. This technique underpins ForcedLeak and several others.
Document-borne payloads. Malicious instructions embedded in PDFs, Word files, or spreadsheets, often as invisible text, tiny fonts, or white-on-white characters that a multimodal model reads but a human doesn't notice.
Calendar and meeting injection. Hidden commands in calendar invites that execute when an assistant is later asked something routine like "what's on my schedule?"
Memory poisoning. Instructions written into an agent's long-term memory during one session execute during a later session, often when a different, more privileged user is interacting.
Tool and connector abuse. Any tool that makes outbound network calls (web search, link preview generation, webhook posting) can be repurposed as an exfiltration channel.
Multimodal image injection. Instructions embedded invisibly in images that only the vision model "sees" when analyzing them.

The AI did exactly what it was "asked" to do in each of these incidents. The attacker's instructions just happened to arrive through a channel the agent trusted.

Why traditional security tools can't block this

Standard security controls are built for attacks at the network or code layer. Prompt injection operates at the semantic layer: the meaning of the words the AI reads.

That makes conventional defenses ineffective:

Web application firewalls look for suspicious syntax, not intent. A malicious instruction phrased as polite English passes through untouched.
Keyword blocklists fail against rephrasing, translation, encoding, or instructions split across multiple sources.
System prompt reminders ("never leak user data") help against casual attempts but collapse under creative adversarial phrasing.
Output filters can catch some obvious exfiltration, but attackers hide stolen data inside trusted-looking URLs, normal tool calls, or innocuous-looking text.

No known technique reliably prevents prompt injection in a general-purpose agent. Defense has to come from architecture, not filtering.

How to reduce AI agent exfiltration risk

The most effective approach is to break the lethal trifecta. Make sure no single agent or execution path has all three dangerous capabilities at once.

Apply the principle of least privilege. Give each agent the narrowest scope that lets it do its job. An email-summarizing agent does not need send privileges. A customer support agent does not need the whole customer database.

Inventory agent capabilities. For each agent you deploy, document what private data it can access, what untrusted inputs it consumes, and what external communication it can perform. If all three are non-empty, you have a lethal-trifecta configuration that needs a specific mitigation.

Discover shadow agents. With 82% of organizations harboring unknown AI agents, discovery is now a baseline control. Agents deployed by individual employees through no-code platforms, MCP connectors, or browser extensions are common and frequently unmanaged.

Add human-in-the-loop controls. Require user confirmation before the agent performs consequential actions like sending email, creating pull requests, or making outbound API calls.

Isolate untrusted content. Architectures that process untrusted input in a separate, tool-less model before passing results to the capable agent reduce the attack surface.

Audit integrations and MCP servers. Every connector an agent can reach expands the blast radius of a successful injection. Remove integrations that are not actively used, and review the permissions on the ones that remain.

Manage the full agent lifecycle. Decommission agents you are no longer using. Industry research has found that only 21% of organizations have formal decommissioning processes, leaving abandoned agents holding active credentials long after their useful life.

Monitor agent behavior. Log tool calls, track outbound network activity from agents, and watch for anomalous data access patterns, the same way you would for a high-privilege service account.

The bottom line

Every agent with the lethal trifecta is an exfiltration system waiting for its attacker to arrive. EchoLeak, ForcedLeak, ShareLeak, and Comment and Control were all found by researchers rather than exploited at scale. That window is closing. The organizations getting ahead of this are the ones treating AI agent security as the structural design problem it actually is — inventorying what each agent can reach, breaking the trifecta by default, and assuming that any agent with all three legs will eventually be exploited. Because eventually, one will be.

Frequently asked questions

What is prompt injection? Prompt injection is an attack where malicious instructions are placed inside content an AI processes, such as a prompt, document, email, or webpage, causing it to perform actions the attacker wants rather than what the user intended.

What is indirect prompt injection? Indirect prompt injection hides the malicious instructions inside content the AI reads as part of its normal operation, rather than inside the user's prompt. It is the primary mechanism behind AI agent data exfiltration.

Are AI chatbots also at risk? A chatbot without tools has a smaller attack surface. The exfiltration risk increases dramatically when the AI can take actions, access external systems, or communicate outside the conversation.

Does the Model Context Protocol (MCP) make this worse? MCP itself is a connection standard, but it lowers the cost of combining tools. Users can easily compose an agent with all three legs of the lethal trifecta without realizing it. Every MCP server added to an agent should be evaluated for the capabilities it introduces.

Can I detect AI agent exfiltration after the fact? Sometimes, through logging and network monitoring. But many of these attacks look like normal agent activity. Prevention is more effective than detection.

Is there a "patch" for prompt injection? No. Prompt injection is a consequence of how LLMs work, not a bug in a specific system. Individual vulnerabilities can be fixed, but the underlying class of attack remains unsolved and is expected to persist for the foreseeable future.

How widespread is the problem of unknown AI agents? Industry research from April 2026 found that 82% of organizations had unknown AI agents running in their environments, typically deployed by individual employees through no-code platforms, browser extensions, or experimental projects without formal security review.

Most security teams don't have a complete picture of which AI agents have access to their environment, what those agents can reach, or what data is moving through them. See how Nightfall helps close that gap.