Blog

Is Slack using your data to train their AI models? Here’s what you need to know.

by
Rohan Sathe
,
May 21, 2024
Is Slack using your data to train their AI models? Here’s what you need to know. Is Slack using your data to train their AI models? Here’s what you need to know.
Rohan Sathe
May 21, 2024
On this page

AI is everywhere—but how can you be sure that your data isn’t being used to train the AI models that power your favorite SaaS apps like Slack?

This topic reached a fever pitch on Hacker News last week, when a flurry of Slack users vented their frustrations about the messaging app’s obtuse privacy policy. The main issue? All Slack customers must manually opt out of letting Slack use their data to train their global AI models as well as their new generative AI “add-on,” Slack AI. 

What data does Slack collect, and how does Slack use it? 

Slack jumped quickly into the fray with a blog post and an updated privacy policy. They disclosed that Slack’s global machine learning (ML) algorithms are used for things like auto-filling search results and suggesting emojis. They also claim that they “do not build or train these models in such a way that they could learn, memorize, or be able to reproduce any customer data of any kind,” and that these models “use de-identified, aggregate data and do not access message content in DMs, private channels, or public channels.”

However, Slack takes a different approach when it comes to their GenAI tool, Slack AI. A Slack spokesperson stated that Slack AI “uses large language models (LLMs) but does not train those LLMs on customer data,” and that those “LLMs [are] hosted directly within Slack’s AWS infrastructure, so that customer data remains in-house and is not shared with any LLM provider.”

But even with these practices in place, why doesn’t Slack simply ask their customers to opt in, instead of doing so automatically? “Workspaces are not opted out by default… so that we can provide the best possible product experience,” they explain. While it does take a large corpus of data in order to train and fine-tune ML models and LLMs, customers argue that it shouldn’t come at the expense of data privacy. 

What does this mean for Slack customers? 

For many, Slack’s lack of transparency is a blow to customer trust. Some argue that Slack’s policy is setting a dangerous precedent for future SaaS apps, which may also choose to follow suit and automatically opt customers into sharing their data with AI models and LLMs without prior consent. This is in violation of General Data Protection Regulations (GDPR), which has left some users and companies scrambling to determine what this means for compliance. 

Furthermore, concerned users have raised a number of questions that still remain unanswered, such as: Even if a customer manually opts out, does Slack retroactively delete data from their models? Topics like these have a lasting impact on not only Slack customers’ data, but also their compliance with standards like the aforementioned GDPR as well as HIPAA, PCI-DSS, ISO-27001, and more. 

What safeguards can you put in place to protect your sensitive data?

Slack’s story is undoubtedly the first of many to come, especially as more and more companies race to build new AI apps and adopt LLMs into their workflows. 

According to the shared responsibility model of AI, LLM providers are liable for securing their platform and services, but it’s up to customers to secure their sensitive data. With this in mind, here are a few rapid-fire best practices for companies who are looking to proactively protect their interactions with AI apps. 

  • Stop data sprawl: Data sprawl, or the commonplace sharing of sensitive data, is one of the most pervasive threats facing cloud-based enterprises. More specifically, sprawled API keys, secrets, and credentials can lead to privilege escalation attacks, data exposure, and noncompliance. To illustrate the magnitude of the data sprawl problem, Nightfall detects an average of 15,000 instances of sensitive data sprawl per day. Furthermore, according to Nightfall’s recent “State of Secrets” report, over 17% of detected API keys were found in Slack files. With the risk of data sprawl in mind, it’s crucial for companies to scan for secrets and other sensitive data in real time, to prevent them from leaking to third parties like Slack, OpenAI, Anthropic, and more. 
  • Filter prompt inputs: Despite any precautions that SaaS apps like Slack might have in place, it’s still important to ensure that no sensitive data makes its way past an enterprise’s trust boundary. One part of that means intercepting outgoing AI prompts and scrubbing them of sensitive data before they’re sent to third parties. This is precisely why we built Nightfall for ChatGPT: To scan and detect sensitive PII, PCI, PHI, secrets, and credentials before they're stored and used to train third-party LLMs.
  • Educate employees about data sharing best practices: While data scanning solutions can be incredibly useful for cleaning up cloud environments, employee education is a more proactive way to limit data sprawl. Annual training can be useful to impart overarching best practices, but they can only do so much. Instead, holistic Data Leak Prevention (DLP) platforms like Nightfall can send real-time Slack and email notifications to coach employees about custom policies and best practices, and even encourage employees to remediate their own policy violations themselves.

Platforms like Nightfall can integrate seamlessly with Slack to conduct both real-time and historical scans for PII, PCI, PHI, secrets, credentials, as well as to ensure continuous compliance with standards like GDPR, HIPAA, and more. Nightfall also offers a “Firewall for AI” that can instantly scrub sensitive data from any AI inputs or data pipelines, ensuring continuous security for any company that either consumes or builds AI.

Curious to learn more about how Nightfall can protect your company and customer data from AI apps? Sign up for your custom demo today. 

Nightfall Mini Logo

Getting started is easy

Install in minutes to start protecting your sensitive data.

Get a demo