Guides
The Essential Guide to Data Loss Prevention (DLP) for LLMs
by
Madeline Day
,
August 4, 2023
The Essential Guide to Data Loss Prevention (DLP) for LLMsThe Essential Guide to Data Loss Prevention (DLP) for LLMs
Madeline Day
August 4, 2023

The Essential Guide to Data Loss Prevention (DLP) for LLMs

On this page

Why do SaaS apps use third-party LLMs?

SaaS companies can level up their features by leveraging third-party sub-processors from public models like OpenAI and Anthropic. These sub-processors might come into play while helping users to summarize meetings, craft emails, generate code, and more, all without having to leave the SaaS app.

What SaaS apps have third-party sub-processors?

Following the launch of ChatGPT in November 2022, a growing number of popular SaaS apps have released their own AI features, including Asana, Notion, and Atlassian, just to name a few.

Source: AI Sub-Processors Tracker for SaaS/IaaS

What are the risks of using or building SaaS apps with third-party LLMs?

There are two overarching concerns associated with both direct and indirect LLM usage: 

  1. Sensitive data might be transmitted to third-party LLMs via SaaS apps’ AI sub-processors.
  2. Once stored on an LLM server, sensitive data might be used to train public models, thereby risking data security and developer compliance. 

Let’s dive into the details. 

Transmitting sensitive data to LLMs

SaaS apps may vary in terms of how they handle data privacy. In some cases, data will be protected by frameworks like Europe’s General Data Protection Regulation (GDPR), whereas in others, data might be collected and stored in third-party LLM servers. For instance, if a SaaS app transmits data to OpenAI, OpenAI can keep that data for up to 30 days. While OpenAI used to automatically collect data via its APIs, it put an end to this practice in early 2023. Now, API users must opt in to allow OpenAI to store and use their data to train their model. With this in mind, it’s always a good idea to check the privacy policy and terms of use of any SaaS app before you engage with its GenAI capabilities. 

Using sensitive data to train LLMs

Most LLMs like OpenAI have some measures in place to prevent users’ sensitive data from making its way into public models. However, recent academic studies and newsworthy events have shown that it’s possible for threat actors to reconstruct information like PII and API keys that were part of the original LLM training data. This poses a threat both to the user, whose data has been compromised, as well as to the developer, who may risk noncompliance with security frameworks. 

How can you stay secure while using and building SaaS apps with third-party LLMs?

What are the solutions to the two aforementioned risks? Let’s address them one at a time. 

Enhance visibility into the cloud

It’s difficult to protect your data if you can’t see it. More traditional DLP methods can only protect data from certain networks or endpoints, and have limited visibility into SaaS apps and other cloud-based environments. With this in mind, a critical first step to any security solution for SaaS apps and LLMs is to leverage a cloud DLP tool to uncover the sensitive data that’s sprawled across the cloud. Once you’ve identified your sensitive data, you’ll be able to remediate it before it can be leaked to other cloud apps, including LLMs. 

Leverage AI to ensure continuous compliance

In security teams’ increasingly fast-paced workflows, AI-powered DLP tools are useful for automating remediation to any threats to compliance. Automated remediation not only minimizes the chance of data leakage—it also has the added bonuses of curtailing security team workloads and cutting down on the cost of data breaches. According to IBM, AI-powered DLP tools help security teams to react and recover from breaches over 100 days faster than teams without AI-powered DLP. 

What is cloud DLP?

Cloud data leak prevention (cloud DLP) is a strategic approach to preventing sensitive data from being shared outside of authorized SaaS and cloud apps.

Cloud DLP has become increasingly relevant as more and more businesses switch to SaaS and cloud-based tools. This year alone, Google reports that over 30% of surveyed tech and business leaders are planning to forego “legacy enterprise software” in lieu of cloud-based services, with an even greater number of leaders (41.4%) looking to increase their use of cloud-based services overall. 

Source: Google Cloud blog

This means that millions of employees are learning how to navigate their new cloud-based workspaces—and that many of them are also unknowingly putting their companies’ data at risk by sharing it all over the cloud.

How does cloud DLP protect data in SaaS and custom apps?

According to Verizon’s latest report, 74% of all data breaches involve some element of human error.

Source: Verizon 2023 Data Breach Investigations Report

Seeing as it’s virtually impossible to redact or delete sensitive data once it’s submitted to an LLM, it’s imperative that security teams find a tool that can grant them visibility into SaaS apps with third-party AI sub-processors. In line with this, it's also vital for security teams to use tools that can help them to ensure continuous compliance.

Cloud DLP technologies like Nightfall come in handy in moments like this by offering seamless cloud-native protection for SaaS apps as well as customizable compliance solutions with APIs. 

What is Nightfall AI? 

As organizations expand, their data sprawl expands with them. This proliferation of data across the cloud increases the risk of leaks, exposure, and breaches, which can cause financial and reputational harm. Enter: Nightfall AI. As the first cloud-native DLP platform, Nightfall deploys machine learning-trained detectors to protect sensitive data and ensure continuous compliance with leading industry frameworks. 

What are the key benefits of Nightfall for SaaS?

Nightfall automatically detects over 100 sensitive data types like PII, PCI, PHI, secrets, and credentials across eleven native integrations. When Nightfall detects sensitive data, it sends an alert to Nightfall’s intuitive user console. From there, security teams can see context-rich insights about violations, deploy remediation actions, and send notifications to employees without having to go to another app. 

What are the key benefits of Nightfall for LLMs and Custom Apps?

Nightfall for LLMs and Custom Apps helps security teams to ensure security and compliance with leading frameworks like HIPAA, PCI-DSS, ISO 27001, and more by integrating seamlessly with any cloud app. Security teams also have the option to create their own custom detection rules, as well as to leverage SDKs for Java, Python, Go, and Node.js, in order to cut down on alerts and streamline their workflows.

How does Nightfall leverage AI to protect against AI?

Nightfall’s AI-powered detectors use neural network embeddings to identify PII, PCI, and PHI as well as secrets and credentials, all with “Possible,” “Likely,” or “Very Likely” confidence levels

For example, say a patient submits their social security number (SSN) on a Zendesk ticket at a health care company help desk. Nightfall's specialized SSN detector will scan that ticket to determine if any of the content matches the precise format of an SSN. At the bare minimum, any number that matches the SSN format will be classified as having a "Possible" confidence, even without any additional context. However, if the format matches and the patient includes context around that SSN (such as phrases like "My social is" or "I applied for an insurance policy with my SSN"), then Nightfall will classify the SSN violation as "Likely" or "Very Likely." All in all, these context clues sharpen the accuracy of Nightfall's detectors—and help security teams cut down on false positive alerts in the process.

Nightfall also has two features in place to hone our detection engine: An option for users to provide feedback about alerts, and an opportunity for users to extend existing Nightfall detectors with tenant-specific rules.

To illustrate the feedback option, let’s go back to our SSN example. If a healthcare company’s security team receives an alert for an SSN-related violation in their Nightfall console, they’d be able to mark the alert as either a “True Positive” or a false positive (whether it’s “Not an SSN” or “Not a violation"). From there, the voluntary feedback is fed into Nightfall's machine learning models. The more feedback that a team submits, the more accurate their detectors become over time.

But what if that same security team wants a quicker solution? Nightfall will guide them through the process of extending an existing detector by creating their own detection rules. The team might choose to raise their detector’s “Minimum Confidence” level to “Very Likely,” and their “Minimum Number of Findings” to five. In that case, the team would only receive an alert if a message or file detects five or more "Very Likely" SSNs. Detection rules can be adjusted to fit any security team's unique goals and risk tolerance. They're also an effective way to streamline workflows and combat "alert fatigue."

 

How do I implement Nightfall for SaaS? 

Install Nightfall for SaaS in minutes via APIs that connect seamlessly to popular SaaS apps like Slack, GitHub, Confluence, Google Drive, and more. 

How do I implement Nightfall for LLMs and Custom Apps?

After logging into your Nightfall console and creating an API key, you’ll be able to access over 60 of Nightfall’s pre-built detectors. From there, you can also create your own detection rules and policies before starting to make scan requests

What companies use Nightfall? 

Nightfall is trusted by leading organizations across fields including software development, FinTech, and healthcare. 

How do I get started?

Schedule time with one of our product specialists here, or email us at at sales@nightfall.ai with any questions.

Nightfall Mini Logo

Getting started is easy

Install in minutes to start protecting your sensitive data.

Get a demo