Guides
ChatGPT DLP (Data Loss Prevention): The Essential Guide
by
Madeline Day
,
August 4, 2023
ChatGPT DLP (Data Loss Prevention): The Essential GuideChatGPT DLP (Data Loss Prevention): The Essential Guide
Madeline Day
August 4, 2023

ChatGPT DLP (Data Loss Prevention): The Essential Guide

On this page

Why ChatGPT?

Whether you’re in the business of generating code, composing marketing emails, or providing customer support, ChatGPT offers a multitude of time-saving, creativity-boosting benefits—while also reducing overall costs. 

As a result of these cutting-edge offerings, ChatGPT garnered over 100 million users in just the first two months after its release—and it’s not letting up any time soon. According to a recent survey, 49% of business leaders are already using ChatGPT, and an additional 30% plan to start within the coming months.

‍

Source: Resume Builder

‍

These numbers point to an inevitable trend: That ChatGPT is changing work as we know it. 

‍

Why not ChatGPT?

Before long, companies will need to implement GenAI tools like ChatGPT in order to enhance their productivity, lower their overhead costs, and maintain their competitive edge. 

Some companies may not be able to adapt GenAI tools as quickly as others, likely because they might be using legacy technology, facing strict governance issues, or lacking a plan to mitigate the risks of GenAI. 

Several companies have already experienced security leaks or breaches due to GenAI tools like ChatGPT, including Samsung. Samsung is just one of many major companies that have restricted or outright banned the use of ChatGPT due to underlying security concerns.

‍

Source: Forbes, The Washington Post

What are the risks of using ChatGPT?

There are three overarching risks associated with using ChatGPT.

  1. Employees might sprawl sensitive data into prompts. 
  2. Sensitive data might be used to train OpenAI, the large language model (LLM) that supports ChatGPT. 
  3. Businesses might compromise their compliance with leading industry frameworks like ISO 27001, HIPAA, or PCI-DSS.

Let’s take a closer look at how each of these risks play out. 

‍

Sprawling sensitive data

Think of data sprawl as the commonplace sharing of sensitive data in places it shouldn’t be shared (including ChatGPT). While data sprawl is the most common security risk that any team will face, it’s also the most easily overlooked due to its perceived lack of severity. This couldn’t be further from the truth, as data sprawl opens the door to much more detrimental threats like data leakage and data loss. To put this into perspective, Nightfall detects over 15,000 instances of sensitive data sprawl per day. 

Let’s get a little more specific. As soon as sensitive data is submitted in a ChatGPT prompt, it’s leaked into OpenAI. This turns ChatGPT and OpenAI into possible attack vectors that security teams need to monitor. However, there’s just one problem: Security teams that rely on traditional DLP strategies often don’t have visibility into cloud-based apps like ChatGPT. 

‍

Leaking data in LLMs

Say that an employee uses ChatGPT to help them generate code. In the process of writing their prompt, they copy and paste sample code that includes an active API key. The moment that that API key is submitted, it’s impossible to delete or redact it from OpenAI’s servers. Furthermore, unless customers have explicitly requested to opt out of data collection, that leaked API key might also be used to further train OpenAI.

But it doesn’t stop there. Recent events have shown that it’s possible to trick ChatGPT into generating active API keys. Though OpenAI has some measures in place against this, it’s theoretically possible for a leaked API key to find its way to a threat actor who’s prompting ChatGPT for credentials. From there, that threat actor could use the API key to access company systems, escalate their privileges, and steal company data. 

The bottom line? It’s vital for security teams to block company data from making its way into OpenAI servers. 

‍

Compromising compliance

If security teams use traditional DLP methods, they may not have complete visibility into cloud-based apps like ChatGPT. And without this visibility, it becomes difficult to ensure compliance with industry frameworks like ISO 27001, HIPAA, PCI-DSS, and SOC 2, among others.

‍

‍

There are three possible threats to compliance when ChatGPT is concerned. 

  1. Employees may include sensitive customer or vendor data in ChatGPT prompts, thereby leaking that data to OpenAI. 
  2. ChatGPT may include sensitive data in responses, and engineers may unknowingly incorporate those responses into their work.
  3. ChatGPT may generate code that’s riddled with hidden vulnerabilities, which may later result in data leakage or data loss.

‍

How can you stay secure while using ChatGPT?

What are the solutions to these three risks? Let’s address them one by one. 

‍

Stop data sprawl

Employee education is a crucial component of any security strategy. It’s vital to communicate the best practices for using ChatGPT, including the following:

  1. Scan for any sensitive data (such as PII, PHI, PCI, credentials, or proprietary code) before submitting your prompt. Use synthetic data if necessary. 
  2. Review AI-generated responses for any sensitive data that may come from third parties.
  3. Evaluate AI-generated responses for accuracy, vulnerabilities, or malicious code before implementing them into your workflow. 

‍

Plug data leaks

Use this form to opt out of allowing OpenAI to collect and use your business’ data. Note that opting out does not apply retroactively. In other words, once data has been submitted to ChatGPT, there’s no way to retrieve or redact it from OpenAI. 

However, let’s also take a step back to look at the bigger picture. Seeing as a number of SaaS apps now offer GenAI capabilities through third-party LLMs—including OpenAI—it’s also important to sanitize sensitive data in SaaS apps. This solution kills two birds with one stone by preventing data leaks to third-party LLMs as well as minimizing the opportunity for employees to accidentally copy and paste sensitive data from SaaS apps to ChatGPT.

‍

Ensure continuous compliance

Deploy cloud-based DLP technologies to redact sensitive data from GenAI tools, scrub data from SaaS apps, and continuously monitor for threats to compliance. According to IBM, AI-powered DLP tools are the best options for quicker responses, therefore, lower costs when faced with data breaches. According to their most recent Cost of a Data Breach Report, “Organizations with extensive use of both AI and automation experienced a data breach lifecycle that was 108 days shorter compared to studied organizations that have not deployed these technologies… Studied organizations that deployed security AI and automation extensively saw, on average, nearly $1.8 million lower data breach costs than organizations that didn't deploy these technologies.” 

‍

What is cloud DLP?

Cloud data leak prevention (cloud DLP) is a strategic approach to preventing sensitive data sharing outside of authorized SaaS and cloud apps.

Cloud DLP has become increasingly relevant as more and more businesses switch to SaaS and cloud-based tools. This year alone, Google reports that over 30% of surveyed tech and business leaders are planning to forego “legacy enterprise software” in lieu of cloud-based services, with an even greater number of leaders (41.4%) looking to increase their use of cloud-based services overall. 

‍

Source: Google Cloud blog

‍

This means that millions of employees are learning how to navigate their new cloud-based workspaces—and that many of them are also unknowingly putting their companies’ data at risk by sharing it all over the cloud.

‍

How does cloud DLP protect data in ChatGPT?

According to Verizon’s latest report, 74% of all data breaches involve some element of human error. In line with this, ChatGPT’s user-friendly design and conversational tone make it that much easier for employees to lower their guard and submit sensitive data without thinking. So what’s a security team to do?

‍

Source: Verizon 2023 Data Breach Investigations Report

‍

Seeing as it’s impossible to redact or delete sensitive data once it’s submitted to ChatGPT, it’s imperative that security teams find a cloud DLP tool that lets them monitor and remediate ChatGPT prompts in real time. It’s also vital for security teams to strengthen their first line of defense—their employees—by building an enduring culture of cyber hygiene. 

Cloud DLP technologies like Nightfall come in handy in moments like this by offering seamless in-browser protection for ChatGPT, as well as by sending instant end-user notifications to educate employees about security best practices. 

‍

Does ChatGPT have DLP functionality built in?

Nope! But Nightfall offers the first holistic cloud DLP program for GenAI, including in-browser protection for ChatGPT, cloud-native integrations for SaaS, and a vast library of developer APIs. 

‍

What is Nightfall AI? 

As organizations expand, their data sprawl expands with them. This proliferation of data across the cloud increases the risk of leaks, exposure, and breaches, which can cause financial and reputational harm. Enter: Nightfall AI. As the first cloud-native DLP platform, Nightfall deploys machine learning-trained detectors to protect sensitive data and ensure continuous compliance with leading industry frameworks. 

‍

What are the key benefits of Nightfall for ChatGPT?

Nightfall automatically detects over 100 sensitive data types like PII, PCI, PHI, secrets, and credentials in order to achieve and maintain compliance with leading industry frameworks like ISO-27001, SOC 2, HIPAA, and more. It also enables instant remediation and end-user notifications to stop sensitive data from making its way from ChatGPT to OpenAI servers. When Nightfall intercepts sensitive data, it automatically dispatches a context-rich alert to educate employees and notify their security team. Security teams have the option to view alerts in Nightfall’s intuitive user console as well as in Slack, email, or their SIEM of choice. 

‍

How does Nightfall leverage AI to protect against AI?

Nightfall’s AI-powered detectors use neural network embeddings to identify PII, PCI, and PHI as well as secrets and credentials, all with “Possible,” “Likely,” or “Very Likely” confidence levels. For example, say an employee accidentally submits a social security number (SSN) in a prompt to ChatGPT. Nightfall's specialized SSN detector will scan that prompt to determine if any of the content matches the precise format of an SSN. At the bare minimum, any number that matches the SSN format will be classified as having a "Possible" confidence, even without any additional context. However, if the format matches and the prompt includes context around that SSN (such as phrases like "Please verify this SSN”), then Nightfall will classify the SSN violation as "Likely" or "Very Likely." All in all, these context clues sharpen the accuracy of Nightfall's detectors—and help security teams cut down on false positive alerts in the process.

Nightfall also has two features in place to hone our detection engine: An option for users to provide feedback about alerts, and an opportunity for users to extend existing Nightfall detectors with customer-specific rules.

To illustrate the feedback option, let’s go back to our SSN example. If a security team receives an alert for an SSN-related violation in their Nightfall console, they’d be able to mark the alert as either a “True Positive” or a false positive (whether it’s “Not an SSN” or “Not a violation"). From there, the voluntary feedback is fed into Nightfall's machine learning models. The more feedback that a team submits, the more accurate their detectors become over time.

But what if that same security team wants a quicker solution? Nightfall will guide them through the process of extending an existing detector by creating their own detection rules. Detection rules can be adjusted to fit any security team's unique goals and risk tolerance. They’re also an effective way to streamline workflows and combat “alert fatigue.”

‍

How do I implement Nightfall for ChatGPT? 

Install Nightfall DLP for ChatGPT from the Google Chrome Web Store. Once the browser extension is installed, you can jump right in to a free 14-day trial where you can customize unique detection rules with minimal additional set up or tuning required.

To learn more, schedule a demo with our team or contact us directly at sales@nightfall.ai with any questions.

‍

Nightfall Mini Logo

Getting started is easy

Install in minutes to start protecting your sensitive data.

Get a demo