From ChatGPT to DALL-E to Grammarly, there are countless ways to leverage generative AI (GenAI) to simplify everyday life. Whether you’re looking to cut down on busywork, create stunning visual content, or compose impeccable emails, GenAI’s got you covered—however, it’s vital to keep a close eye on your sensitive data at all times.
How can threat actors access your data?
The moment you submit your data to a SaaS app or GenAI tool, it’s considered to be either “sprawled” or leaked. In short? Your data has been shared somewhere it shouldn’t be shared. And when your data is shared somewhere it shouldn’t be shared, it easier for threat actors to uncover it and use it to their advantage.
There are four possible routes that a threat actor might take when it comes to accessing data via SaaS apps and GenAI tools. Let’s take a closer look at each of them.
According to G2, 85% of businesses will be “cloud first” by 2025. That means that each year, there are millions more employees sending messages and storing data on SaaS apps like Slack, Teams, GitHub, and Google Drive. Together, these apps provide a frictionless flow of information within organizations. However, Nightfall’s detection team discovered that as many as one in five employees expose sensitive data in the span of just one month—and most of those exposures occur via cloud-based apps.
Imagine this: A software development team is working on a project in GitHub, and includes an active API key in their most recent commit. Not only is that API key sprawled across GitHub via future commits—it’s also proliferated even further via the cloud. Should a threat actor ever get access to one of the developers' GitHub accounts, they could easily access the API key and wield it in the form of a ransomeware attack.
But the risk doesn’t stop there. Many SaaS apps, like Jira, Zendesk, and Asana, use third-party sub-processors to offer in-app GenAI features. In our previous example, say that a member of the software development team wants to pair program with GitHub Copilot. It would be all too easy for the developer to paste code containing the active API key into their Copilot prompt. Once they submit their prompt, that API key goes straight to OpenAI’s servers. What exactly can OpenAI do with the data from prompts? Read on to find out.
GenAI training data
Following the launch of ChatGPT, GenAI tools have surged to popularity. In a recent poll, Reuters reported that nearly 30% of employees use ChatGPT on a regular basis to handle repetitive or administrative tasks. That same Reuters poll also found that less than a quarter of businesses “explicitly allow” the use of ChatGPT, meaning that over 75% of businesses do not.
Most businesses that restrict or ban employees from using ChatGPT are concerned about three things:
- Sensitive data can be sprawled from ChatGPT to OpenAI servers.
- OpenAI servers can use that sensitive data to train the next GPT model.
- Threat actors can reconstruct sensitive training data by merely prompting ChatGPT.
How might this play out in a real-life scenario? Say that a customer writes to an e-commerce help desk to request a refund. Hoping for a faster resolution, the customer hastily includes their credit card number as part of their ticket. Later on, the business’ service rep might might use the content of the customer’s ticket (including their credit card number) to prompt ChatGPT to write a response. Within milliseconds, that credit card number is irreversibly transmitted to OpenAI’s servers, where it can be stored indefinitely and used to train future iterations of ChatGPT. Further down the line, a threat actor could prompt ChatGPT for active credit card numbers. If that threat actor is persistent, and provides ChatGPT with enough context, it’s entirely possible that ChatGPT will accurately reproduce the customer’s credit card number from its original training data.
Annotation platform data
Naturally, large language models (LLMs) like OpenAI need fine-tuning over time. For this reason, GenAI companies might partner with annotation platforms to label their training data and evaluate their models’ accuracy.
You guessed it—it’s time for another example. Now we’ll follow along with a developer who’s working on a GenAI chatbot that offers health care customer service. The developer taps an annotation platform to rate the quality of the chatbot’s responses, but the developer doesn’t realize that patient PII and PHI has made its way into the chatbot’s training data. Now, not only is that sensitive data being used to train the chatbot model, but it’s also been exposed to the annotation platform. This data leak presents yet another entry point for threat actors. In this instance, if a threat actor breached the annotation platform’s database, they could gain access to the patient PII and PHI that was submitted in their chatbot prompts.
Third-party contractor data
To extend our above example, say the annotation platform brought on outside contractors to label the healthcare chatbot’s training data. All a threat actor would have to do is gain access to the contractor’s computer in order to steal patient PII and PHI.
How can you protect your sensitive data?
While controlling your data may seem like an insurmountable task at times, Nightfall proposes two simple solutions:
- Total cloud visibility: Our new cloud-based era requires a cloud-based solution. Enter: Cloud DLP. Unlike network DLP and endpoint DLP, cloud DLP solutions like Nightfall can monitor data that’s sent via APIs—meaning that security teams have an unobstructed view of sensitive data that’s transmitted and stored in SaaS apps and GenAI tools.
- Real-time and historical remediation: Both network DLP and endpoint DLP have significant limitations regarding data remediation; for instance, they can’t remediate data in SaaS apps without the use of blunt tactics, like proxies, which can impact employee productivity. On the other hand, cloud DLP empowers security teams to remediate specific instances of sensitive data sprawl in real time, without delaying employee workflows.
Nightfall’s latest product offering, Nightfall for ChatGPT, incorporates both of these solutions into a single elegant browser plugin. Want to get a closer look at how Nightfall can proactively protect your data from public AI models, third parties, and threat actors? Try Nightfall for ChatGPT for free by downloading it in the Chrome Web Store.