Unstructured data is data that cannot be processed and analyzed using conventional data tools and methods: qualitative data, such as customer feedback or social media posts are considered unstructured data.
Unstructured data is particularly prevalent in the healthcare industry, where patient records, doctors' notes, and other unstructured data can make upward of 80% of data within a healthcare organization. Because of its nature, it can be difficult to track and secure PHI and PII in unstructured data. However, HIPAA regulations mandate that healthcare organizations protect unstructured data with the same rigorous privacy and security protocols as structured data.
Structured vs unstructured data
Perhaps the best way to answer the question, “what is unstructured data?” is to compare it with structured data.
Structured data is quantitative data that is highly organized and easily decipherable by conventional methods like lookups, queries, or regular expressions (regex) — and is especially relevant in the context of relational databases like those built on, SQL. Structured data can include information such as dates, names, addresses, and credit card numbers.
Structured data has obvious advantages. It can be easily used to extract customer and patient insights. It does not require an in-depth understanding for users to access and interpret the data. There is a range of tools on the market that can help users analyze structured data.
Unstructured data, however, is a little more difficult to manipulate and understand. Its non-formatted, non-standardized structure means that IT teams must do some work to prepare the information to be analyzed. Such data is typically stored in non-relational or NoSQL databases such as MongoDB, which can be more difficult to query. .
Nevertheless, unstructured data can be a goldmine of information — especially in the healthcare industry, where structured data is a small percentage of the data collected. Most notes, transcripts and documents that are generated at hospitals, clinics and doctor’s offices aren’t machine-readable, but they are crucial to ensuring optimal patient health outcomes.
[Read more: Risk and Opportunities of Unstructured Data for Businesses]
Managing unstructured data
Unstructured data offers the healthcare industry interesting opportunities as well as logistical challenges. Until recently, doctors and nurses were manually converting unstructured data into structured, typing notes during patient interactions and using digital forms to try to capture information in a standardized way. However, breakthroughs in AI and NLP are changing this dynamic.
“[AI] and natural language processing make it possible to machine-read an unstructured medical record and identify concepts that providers and payers can employ in a variety of healthcare use cases—all without the need for a human being to read and edit the document first,” reported Accenture. “The use cases can range from insurance eligibility and payment reviews to situations focused on clinical trials or clinical decision support.”
These systems save the healthcare provider from having to manually convert their unstructured data, and instead use AI to “train” the program to recognize a use case or decipher a document. Analyzing unstructured data can not only lead to deeper patient insights, but it can also help reduce burnout among care providers and save on costs.
Yet, unstructured data poses a security challenge, especially in the healthcare industry.
The importance of security for unstructured data
Unstructured data proliferates very quickly in cloud systems, like SaaS collaboration tools such as GitHub and Slack, as the entire purpose of such tools is to allow for the sharing of critical information, regardless of its format. This is helpful for collaboration, but increases compliance and security risks as the volume of data that's stored in these systems increase. Watch the video below to see how quickly this can become a problem, especially for organizations in regulated industries like healthcare.
Protecting unstructured data
HIPAA requires that the healthcare industry protect patient data in all forms, including unstructured data. HIPAA establishes national standards to protect sensitive patient information from being disclosed without consent.
[Read more: PHI Compliance: What It Is and How To Achieve It]
Organizations that are deemed “covered entities” — e.g., healthcare providers, health plans, and third-party partners who use patient data for marketing, research, or fundraising purposes — must implement safeguards to prevent the unauthorized access or use of PHI. Penalties for failing to do so can be extremely high: up to $50,000 per offense per day.
The increase of ransomware attacks on healthcare organizations, coupled with the difficulties inherent in identifying and standardizing unstructured data, present an enormous challenge for healthcare IT teams. Not only are ransomware attacks on the rise, but they’re also specifically targeting the healthcare industry.
Again, AI and machine learning offer a solution. Nightfall’s cloud DLP platform uses machine learning detectors specifically tuned for PHI, scanning both structured and unstructured data in cloud platforms like Slack, Google Drive, and Jira. Nightfall is able to detect patient names, addresses, medical record numbers, Social Security numbers, as well as a number of industry codes like ICD, FDA, DEA, NPI, DOB, and more.
Crucially, Nightfall can scan unstructured data and parses text from 100+ file types. This could include data from customer chat logs, JSON objects, application logs, spreadsheets, PDFs, images, screenshots, and more. For IT teams, Nightfall is a key tool for monitoring, alerting, and resolving incidences where patient information may be compromised.
Read more about how regulated organizations like Capital Rx, Flatfile, and UserTesting have leveraged Nightfall to ensure data privacy and security compliance within the technologies they use. To learn more about Nightfall, set up a demo using the calendar below.