Nightfall’s recent “State of Secrets” report uncovered that collaboration, communication, and IT service tools have the highest risk of data exposure, particularly in industry-leading SaaS apps like Slack and GitHub. This trend highlights an incredibly pervasive (yet often overlooked) risk in cloud cybersecurity: Data sprawl.
What’s Data Sprawl?
You’ve heard of data loss and data leakage. While both of these terms tend to be used interchangeably in reference to “DLP,” they each highlight a different dimension of risk. So how do we differentiate the two? And where does data sprawl fit in?
Think of data sprawl as the commonplace sharing of sensitive data in places it shouldn’t be shared. As more and more organizations transfer to using cloud-based tools and SaaS apps, data sprawl continues to expand rapidly across the board. This year alone, Google reports that over 30% of surveyed tech and business leaders are planning to forego “legacy enterprise software” in lieu of cloud-based services, with an even greater number of leaders (41.4%) looking to increase their use of cloud-based tools overall. This means that millions of employees are learning how to navigate their new cloud-based workspaces—and that many of them are also unknowingly putting their companies’ data at risk by sharing it all over the cloud. To throw that into perspective, Nightfall detects over 15,000 instances of sensitive data sprawl per day.
Let’s dive into an example. Say an employee sends their credentials to a colleague via a popular communication app. While those credentials haven’t technically been leaked to threat actors (yet), it still presents a risk by being copied and shared. Down the line, a threat actor might manage to access another employee’s account for that app, find those shared credentials, and use them to escalate their own privileges. That sort of access leaves the company vulnerable to a worst-case scenario like a ransomeware attack.
In this story we see distinct elements of data sprawl, data leakage, and data loss. Data sprawl appears at the start of the story when the employee shares their credentials somewhere that they shouldn’t be shared. Data leakage takes over when that data falls into the hands of the threat actor. And, last but not least, data loss happens a result of the ransomeware attack, when the company’s data can no longer be accessed.
Now we can see how data sprawl is not only the most prevalent risk any security team will face—it’s also a “gateway” to much more detrimental threats.
How do you keep sensitive data out of SaaS apps?
There are three main risks while using any SaaS app—especially if that SaaS app has generative AI (GenAI) capabilities. The first risk is that an employee or customer might sprawl sensitive data by sharing it in a message, ticket, or workspace. The second risk is that that sprawled data might be leaked to unsanctioned SaaS apps or third-party large language models (LLMs) like OpenAI or Anthropic. Last, but not least, the third risk is that leaked data might compromise compliance with frameworks like PCI-DSS, HIPAA, SOC 2, or ISO 27001.
So what kind of protection do you need to combat these three risks? Read on to discover how you and your team can create a data sprawl prevention strategy for five of the most popular SaaS apps.
Stop the sprawl in Slack and Teams
Expected to reach 79 million users by 2025, Slack is one of the world’s most well-known collaboration platforms. Teams is also used across millions of workplaces, and has nearly doubled its user base in the past several years. These apps are popular because they offer a seamless communication experience—however, it’s often so seamless that employees don’t think twice about sharing their sensitive data. So what’s a security team to do? At Nightfall, we believe that employee education is an essential component of any data sprawl prevention strategy.
Say a digital health employee shares a file containing patient diagnoses over Slack. Using Nightfall, that company’s security team would have the option to notify the employee about precisely when, where, and how they violated PHI policy. That security team would also have the option to delete or quarantine the PHI after the file is sent. While this might seem counterintuitive at first, it’s rooted in one of Nightfall’s core philosophies: To educate employees without impeding their workflows. In fast-paced work environments like Slack and Teams, the latency caused by intercepting content would slow employee productivity to a crawl. For this reason, Nightfall’s near real-time notification and remediation features can help security teams to encourage an enduring culture of cyber hygiene while also safely containing data sprawl.
Monitor for leaked credentials in GitHub and Jira
As the world’s leading code-hosting platform, GitHub needs no introduction. GitHub smoothes over the developer experience with Copilot, an AI “pair programmer” that uses OpenAI as an LLM to generate and suggest code. Developers also rely on apps like Jira to manage projects and boost productivity. Following the release of Atlassian Intelligence, developers can leverage the OpenAI model in Jira to automatically generate action items, outline documents, and more. Though both GitHub’s and Jira’s AI features have data protection standards in place, it’s still a good idea to prevent sensitive data from being exposed to third parties. This is where the next dimension of data sprawl prevention comes into play.
Let’s imagine that a software development team is tracking a GitHub project through Jira, and that a few active API keys are pasted from GitHub repos into Jira projects. In this case, the company’s security team could check for leaked credentials in both GitHub and Jira by deploying Nightfall’s out-of-the-box detectors. Each of these ML-trained detectors can be fine-tuned to individual teams’ needs via customer-specific detection rules. In this example, the security team might be staunchly risk averse, so they might set their “Minimum Confidence” level to “Likely” and their “Minimum Number of Findings” to one. In other words, their detectors would only need to find one “Likely” API key in order to send an alert. Though these settings might result in more alerts at first, Nightfall’s detectors will learn and adapt over time to cut down on false positives.
Once the team’s detectors are tuned and ready to go, they can proactively monitor all GitHub repos and Jira projects to make sure no API keys are present—especially if those docs could later be fed to third-party LLMs.
Automate compliance in Zendesk
Zendesk is the go-to app for companies looking to provide top-notch customer service. However, customers who submit tickets to Zendesk can often be a little too forthcoming with sensitive data like PCI and PHI in hopes that it might help agents answer their questions more quickly. When data sprawl is coming from customers, it presents a different sort of threat: Falling short of leading standards like PCI-DSS or HIPAA.
Time for our third and final example. Say a customer opens a ticket to an e-commerce company’s help desk to check on the status of their refund. They might “over-share” by including their credit card number or other personal data. In order to remain in compliance with PCI-DSS, that company’s security team would need to contain and remediate the customer’s data as soon as possible. With Nightfall, the team could create their own custom compliance solution by integrating with Zendesk on the API level. More specifically, they could configure a workflow to automatically redact credit card information from incoming customer tickets so that it isn’t exposed to customer service agents or proliferated further in Zendesk. This elegant solution ultimately protects both the customer’s PCI while also ensuring that the company meets necessary compliance standards.
Throughout this blog post, we illustrated how Nightfall offers a proactive solution to data sprawl in popular SaaS apps. Though data sprawl is one of the most widespread and egregious risks to SaaS apps, it's a critical ingredient of holistic data protection along with more traditional DLP strategies like Identity and Access Management (IAM), SaaS Security Posture Management (SSPM), and User and Entity Behavior Analytics (UEBA).