When looking for programmatic secrets, it’s not easy to figure out what is truly sensitive and how high-risk it is. There are many different types of secrets and credentials, and the context makes a difference. For example, there could be public URLs with tokens in them, public UUIDs, or credentials used in frontend code — these could all be considered API keys or secrets, but not necessarily at the same degree of sensitivity/severity as something like AWS credentials. This guide attempts to help distill different sub-categories and types of programmatic secrets and credentials, hopefully making it a bit easier to detect, pinpoint, and triage the highest risk in the environment.
AWS credentials are sensitive pieces of information used to authenticate and authorize access to AWS services and resources. These credentials include access keys and secret keys, which are used to grant permissions to access and manage various AWS resources, such as EC2 instances, S3 buckets, and RDS databases.
Attackers who obtain access to AWS credentials can potentially gain unauthorized access to the AWS environment and its resources, leading to data breaches or other security incidents. Examples of AWS credentials include IAM user access keys, AWS access keys, and session tokens.
The AWS access key is a long alphanumeric string that typically starts with "AKIA" and is followed by a unique identifier. The secret access key is also a long alphanumeric string that is used to sign requests to AWS services and should be kept secret.
Here is an example of what AWS access key and secret access key might look like:
GCP credentials are sensitive pieces of information used to authenticate and authorize access to Google Cloud Platform services and resources. These credentials include service account keys, OAuth client IDs and secrets, and API keys. They are used to grant permissions to access and manage various GCP resources, such as Google Compute Engine instances, Google Cloud Storage buckets, and Google Cloud SQL databases.
Attackers who obtain access to GCP credentials can potentially gain unauthorized access to the GCP environment and its resources, leading to data breaches or other security incidents.
Google Cloud Platform (GCP) credentials consist of a service account email and a private key, which are used to authenticate and authorize access to GCP services and resources.
The service account email is a unique identifier for the service account, which is a special type of Google account that belongs to your application or a virtual machine (VM) instance, rather than a person. The private key is a file that contains encrypted information that is used to authenticate requests to GCP services.
Here is an example of what GCP service account email and private key might look like:
Azure credentials are sensitive pieces of information used to authenticate and authorize access to Microsoft Azure services and resources. These credentials include access keys, service principal credentials, OAuth client IDs and secrets, and certificates. They are used to grant permissions to access and manage a variety of Azure resources, such as Azure Virtual Machines, Azure Storage accounts, and Azure SQL databases. Attackers who obtain access to Azure credentials can potentially gain unauthorized access to the Azure environment and its resources, resulting in data breaches or other security incidents.
Azure credentials consist of a client ID, client secret, and tenant ID, which are used to authenticate and authorize access to various Azure services and resources.
The client ID is a unique identifier for your application, which is registered with Azure Active Directory (Azure AD). The client secret is a password or a cryptographic key that is used to authenticate requests to Azure AD. The tenant ID is a unique identifier for the Azure AD tenant in which your application is registered.
Here is an example of what Azure credentials might look like:
Azure provides several authentication and authorization options, and the specific credential format may vary depending on the chosen method.
Authorization Headers and Tokens
Authorization headers and tokens are sensitive pieces of information used to authenticate and authorize access to protected resources and services. They are typically included in HTTP requests to validate a user's identity and permissions. If these tokens or headers are compromised, attackers may gain unauthorized access to protected resources or services, leading to potential data breaches or malicious activities. Examples of authorization headers and tokens include JSON Web Tokens (JWTs), OAuth access tokens, and bearer tokens. These tokens are commonly used to authorize access to APIs, web applications, and cloud services.
JSON Web Token (JWT)
A JSON Web Token (JWT) is a standard format for securely transmitting information between parties as a JSON object. It consists of a header, a payload, and a signature, all encoded in base64. The header contains information about the type of token and the hashing algorithm used to secure it, the payload contains the claims or data being transmitted, and the signature is a calculated value that ensures the integrity of the token. JWTs can contain sensitive information, such as user credentials or access rights, and therefore should be treated as sensitive data. Proper security measures must be implemented to protect the token and ensure that it is not compromised or tampered with during transmission or storage.
Here is an example JWT:
OAuth access tokens are sensitive pieces of information used to authorize access to protected resources and services. These tokens are commonly used in OAuth 2.0 authentication flows to provide third-party applications with limited access to user data or services without exposing the user's credentials. OAuth access tokens can grant access to a wide range of user data, including personal information, social media posts, and email messages. If these tokens are compromised, attackers can gain unauthorized access to the user's data and potentially perform malicious activities. Examples of OAuth access tokens include Facebook access tokens, Google access tokens, and Microsoft access tokens. It is important to properly secure and protect OAuth access tokens by regularly expiring or revoking them, using secure authentication mechanisms, and implementing token validation and encryption to prevent unauthorized access or misuse.
OAuth tokens can take different forms depending on the specific implementation and the type of token being used. However, the most commonly used OAuth token type is the access token, which is used to grant access to protected resources.
An OAuth access token typically consists of a long, random string of characters that is generated by the authorization server and is unique to each request. The token is usually sent in the Authorization header of HTTP requests as a Bearer token.
Here is an example of what an OAuth access token might look like:
In this example, the token is a JWT (JSON Web Token), which consists of three parts separated by dots: the header, the payload, and the signature. The header contains information about the token type and the algorithm used to sign it, the payload contains the claims or attributes associated with the token, and the signature is used to verify the integrity of the token.
Cryptographic keys are a fundamental aspect of modern cryptography and are used to protect sensitive data from unauthorized access or tampering. There are two primary types of cryptographic keys: symmetric and asymmetric.
Symmetric keys are used in symmetric encryption algorithms, where the same key is used for both encryption and decryption. Symmetric keys are typically shorter and faster than asymmetric keys, making them useful for encrypting large amounts of data. However, the use of the same key for encryption and decryption means that the key must be kept secret to prevent unauthorized access. Examples of symmetric keys include AES and DES encryption keys used to secure data in transit and at rest.
Asymmetric keys, also known as public-key cryptography, use a pair of keys - a public key and a private key - for encryption and decryption, respectively. The public key is shared with others to encrypt data, while the private key is kept secret to decrypt it. Asymmetric keys are slower and longer than symmetric keys, making them more suitable for encrypting smaller amounts of data. Examples of asymmetric keys include RSA and ECDSA used to secure web traffic, digital signatures, and secure email.
Because cryptographic keys are critical to ensuring the security and privacy of sensitive data, they are highly sensitive and must be protected from unauthorized access. If a key is compromised, it can be used to decrypt or tamper with encrypted data, leading to a security breach or data loss. Therefore, it is essential to follow best practices for key management, such as using strong and unique keys, regularly rotating them, storing them securely, and limiting access to authorized personnel.
RSA (Rivest–Shamir–Adleman) keys are a type of asymmetric cryptographic key used for encryption and digital signatures. RSA keys are named after their inventors, Ron Rivest, Adi Shamir, and Leonard Adleman.
RSA keys come in a public-private key pair, with the public key being widely distributed and the private key being kept secret. The public key is used to encrypt data or messages, while the private key is used to decrypt the encrypted data or message.
RSA keys are widely used in secure communication protocols such as SSL/TLS, S/MIME, and SSH. They are also used for secure data storage, digital signatures, and access control.
RSA keys are sensitive information and should be carefully managed to prevent unauthorized access. The private key in particular should be protected with a strong passphrase and access to it should be restricted to authorized users only. RSA keys should also be rotated periodically to ensure continued security.
RSA keys are typically represented in various formats, including PEM, DER, and PKCS#12. Here is an example of what an RSA private key in PEM format might look like:
SSH (Secure Shell) keys are a type of asymmetric cryptographic key used to authenticate a user to a remote server or network device. SSH keys come in pairs - a public key and a private key. The public key is placed on the remote server, while the private key is kept on the user's local machine.
When the user attempts to log in to the remote server, the server uses the public key to encrypt a challenge message and sends it to the user's machine. The user's SSH client then uses the private key to decrypt the challenge message and sends the response back to the server. If the response matches the challenge, the user is authenticated and granted access to the remote server.
SSH keys are commonly used in place of passwords for secure, automated access to servers and other network devices. They offer several advantages over passwords, including increased security, ease of use, and the ability to automate tasks without the need for human intervention.
SSH private keys are sensitive and should be carefully managed to prevent unauthorized access. It is important to protect the private key with a strong passphrase and to restrict access to authorized users only. Additionally, SSH keys should be rotated periodically to ensure continued security.
An SSH key can look the same/similar to an RSA key above.
Sensitive server and port connection information could include credentials, IP addresses, and ports used to access private systems and data. Here are a few examples:
- A database server that stores sensitive customer information may require a username and password to access, along with the IP address and port number for the database connection.
- A file server that contains confidential company documents may require a user login and password, along with the IP address and port number for the file transfer protocol (FTP) server.
- A server used for hosting web applications or services may require credentials for server access, along with the IP address and port number for accessing the application or service.
Server/port connection information typically includes the server hostname or IP address, the port number to connect to, and sometimes the protocol to use (e.g. TCP, UDP).
Here is an example of what server/port connection information might look like:
In this example, the connection information specifies that the client should connect to the server with hostname example.com using port number 22 and the TCP protocol. This is a typical example of SSH connection information.
Severity will need to be assessed because on their own, server/port information may not be sensitive unless they provide credentials or reveal private or internal systems that could then be the target of further attacks.
Database Connection Strings
database connection string is a string of characters that specifies how to connect to a data source, such as a database or a file. It includes information such as the location of the data source, the type of data source, the authentication method, and any additional parameters. Connection strings are used by applications and services to establish a connection to a data source and retrieve or modify data. Connection strings can contain sensitive information, such as login credentials or connection keys, and therefore should be treated as sensitive data.
Here are a few examples of data connection strings:
Webhook URLs are endpoints provided by web applications that allow external systems to subscribe to notifications and events. These URLs are sensitive because they can grant access to internal systems and potentially allow attackers to perform malicious activities or extract sensitive information. Attackers who obtain access to webhook URLs can potentially intercept or manipulate the data transmitted between systems, leading to data breaches or other security incidents. Examples of webhook URLs include GitHub webhooks, Slack webhooks, and Stripe webhooks. These endpoints are typically used to trigger automated workflows, such as triggering a build pipeline, sending notifications, or processing payment transactions. It is important to properly secure and protect webhook URLs by using secure authentication mechanisms, limiting access to the endpoints, and validating the data transmitted between systems to prevent unauthorized access or misuse. Here are some examples:
- A webhook URL used by a payment processing service to notify a website of successful or failed transactions. This URL could include sensitive financial data such as credit card numbers, billing addresses, and payment amounts.
- A webhook URL used by a healthcare provider to send patient health information to a third-party service. This URL could contain sensitive medical data such as patient names, diagnoses, and treatment information.
- A webhook URL used by a legal or financial service to receive sensitive documents or information from clients. This URL could include confidential legal or financial information that should not be accessed by unauthorized individuals.
Here is an example of what a Slack webhook URL might look like:
In this example, johnsmith is the username, and abc123def456ghi789jkl0mno123pqr456stu7v is a personal access token used to authenticate API requests to the GitHub API.
Tokens for CI/CD (Continuous Integration/Continuous Delivery) are used to authenticate and authorize automated pipelines that deploy code changes to production systems. These tokens are sensitive information because they grant access to the CI/CD infrastructure and can be used to modify or deploy code to production systems. If these tokens are compromised or exposed, it can lead to unauthorized access, data breaches, or malicious code execution. Examples of CI/CD tokens include personal access tokens (PATs) in GitHub, access tokens in GitLab, and service principals in Azure DevOps. These tokens are typically generated and managed by the CI/CD system or the underlying cloud provider, and it is important to properly secure and protect them, such as by storing them in a secure key vault or using encryption to prevent unauthorized access or misuse.
The format of these tokens can vary depending on the CI/CD platform and the specific use case.
For example, if you are using GitLab CI/CD, a CI/CD token might look like this:
This is a JSON Web Token (JWT) that includes a base64-encoded header, payload, and signature. The payload contains information about the CI/CD job, such as the user who triggered it, the repository, and the commit SHA.
Alternatively, if you are using a cloud-based CI/CD service like CircleCI, a CI/CD token might be a simple string of letters and numbers, such as:
This token is used to authenticate API requests to the CircleCI API, allowing the CI/CD process to interact with the CircleCI platform.
Perhaps the most obvious form of credential is a simple username and password. However, what’s tricky here is that a password can take so many different forms. A password could be as simple as password, or something so complex it doesn’t even resemble what would consider a password. Hence, context clues are going to be most impactful in this case.
The format and complexity of passwords can vary depending on the specific requirements of the system or application. In general, strong passwords are typically at least 8-12 characters long, and include a combination of uppercase and lowercase letters, numbers, and symbols.
Here is an example of what a password might look like:
In this example, the password is a randomly generated string of 14 characters that includes uppercase and lowercase letters, numbers, and symbols. This password would be considered strong and would provide a high level of security if used properly.
An idempotent key is a unique identifier used in API requests to ensure that the same request is not processed multiple times. In other words, if the same API request is made with the same idempotent key, the request should be processed only once.
This is useful in situations where API requests may be retried due to connectivity issues or other errors. Without an idempotent key, duplicate requests may result in unintended consequences or errors, such as duplicate charges or orders being created.
By including an idempotent key in the request, the API server can identify duplicate requests and avoid processing them again. The idempotent key should be unique for each request, typically generated by the client application, and included as a parameter in the API request.
They are not considered an API key and are not sensitive since they are used to retry an API request or payment to prevent any duplicates or errors.
Publishable API keys are authentication tokens used in public-facing applications to authorize and authenticate access to an API. They are intended to be used in client-side code and are used to identify the client and the user who is making a request. Publishable API keys grant limited access to specific resources or actions, such as retrieving public data or making non-critical modifications to a user's account.
Examples of services that use publishable API keys include Stripe, a payment processing service that provides a publishable API key to allow clients to make payments; Twilio, a cloud communications platform that provides a publishable API key to send SMS or voice messages; and Google Maps, which provides a publishable API key to access location-based services. In all of these cases, the publishable API key is used to authenticate requests from the client to the server, enabling the client to access specific resources or perform certain actions. While these keys are not considered sensitive information, they should still be kept secure and not shared publicly to prevent unauthorized access to the API.
Here is an example of what a publishable key might look like:
In this example, pk_test_ indicates that this is a publishable key for testing purposes, followed by a unique identifier of letters and numbers. The specific format of a publishable key may vary depending on the service being used, but it typically includes some indication of the type of key and a unique identifier.
UUIDs, or Universally Unique Identifiers, are used in URLs to provide a unique identifier for a resource that can be easily referenced and shared. In many cases, UUIDs are not sensitive and can be safely used in URLs without revealing any confidential information about the resource. However, there are situations where UUIDs can be sensitive and reveal confidential information. For example, if a UUID is used to identify a user account or a confidential document, it could be considered sensitive and should not be shared publicly. On the other hand, if a UUID is used to identify a public resource, such as a product or a blog post, it may not be considered sensitive and can be safely used in a URL.
Here are some examples of when UUIDs may or may not be sensitive in URLs:
- Non-sensitive: A UUID used to identify a public blog post can be safely used in a URL without revealing any confidential information.Example: **https://example.com/posts/6ba7b810-9dad-11d1-80b4-00c04fd430c8**
- Sensitive: A UUID used to identify a user's account can be considered sensitive and should not be shared publicly in a URL.Example: **https://example.com/users/550e8400-e29b-41d4-a716-446655440000**
- Non-sensitive: A UUID used to identify a public product in an e-commerce website can be safely used in a URL.Example: **https://example.com/products/3860c04b-8a14-47fd-a0d9-9cd49d9818e2**
- Sensitive: A UUID used to identify a confidential document in a company's internal document management system can be considered sensitive and should not be shared publicly in a URL.Example: **https://example.com/documents/550e8400-e29b-41d4-a716-446655440000**
Sometimes the UUID may be in a query parameter. A query parameter is a variable encoded in a URL, for example the URL [www.google.com/?key=value](<http://www.google.com/?key=value>) has a query parameter called key with a value of value. A URL can have multiple query parameters embedded within it. Sometimes, a UUID can take the for of a query parameter such as key=abcd or token=abcd . These can be equally valid ways to have non-sensitive UUIDs. Consider the following to assess sensitivity:
- Does the URL provide access to a private resource without the token? Does the presence of the token provide private access?
- Can the token be used to access other resources in the environment beyond the one in question?
Test, staging, and production are all different environments used in software development and deployment.
Test environments are used for testing new features or changes to an application before they are released to the production environment. They may be set up to mimic the production environment as closely as possible, but with dummy data or limited access to live data.
Staging environments are used to test the release candidate of an application before it is pushed to the production environment. Staging environments typically have more realistic data and usage patterns than test environments and may be used to test performance and scalability.
Production environments are the live, publicly-accessible versions of the application that end-users interact with. They contain the actual data and resources used by the application and require strict security and availability measures to ensure high uptime and reliability.
There may be other environments used in software development as well, such as development or QA environments, which are used for internal testing and debugging.
The differences between these environments are mainly in their purpose, data and resource availability, and level of access or restrictions. In general, access to production environments is more restricted than access to test or staging environments to prevent accidental or malicious changes to critical resources. Additionally, different teams or individuals may have different levels of access to each environment, depending on their role in the development or deployment process.
The reason this is important is because there may be different secrets & credentials used in different environments. For example, a test API key used in a test environment is less sensitive in nature than a production API key. However, there is still risk involved as one could potentially confuse these keys, or they could provide additional information about services used that can be useful information for a malicious actor to know.
OSINT stands for Open Source Intelligence. It refers to information that is collected from publicly available sources, such as social media, news articles, and online forums. OSINT can be used for a variety of purposes, such as threat intelligence, investigations, and competitive analysis. It is an important tool for researchers, analysts, and investigators who need to gather information quickly and efficiently.
An example of OSINT for information security is using publicly available information to identify potential security risks or vulnerabilities in an organization's network or infrastructure. This can include gathering information about the organization's systems, software, and personnel.
For example, an attacker might use OSINT techniques to identify the specific software and hardware used by a target organization, as well as any known vulnerabilities or exploits associated with those systems. They might search for job postings or LinkedIn profiles to identify key personnel within the organization, or use social media and online forums to gather information about the organization's security policies and procedures.
Alternatively, a security researcher might use OSINT techniques to identify potential security risks or vulnerabilities in their own organization's network or infrastructure. This might involve searching for information about the organization's systems and software, reviewing publicly available vulnerability databases, and monitoring social media and other online sources for information about new security threats or exploits.
So what does that have to do with secrets detection? When it comes to sensitivity of data, it’s not necessarily just whether the data itself is considered high-risk or not. Data can also be sensitive in nature if it reveals other context clues about the environment. For example, let’s say you see a line of code that says twilio_api_key=abcd. Now, the key itself may not be valid so therefore is not producing risk on its own, but the name of the variable indicates that the company is using Twilio. This provides OSINT information to a bad actor, who can then use that information to try new tactics. For example, they may send an Engineer a phishing email that indicates they need to reset their password on Twilio, and this may then seem believable because Twilio is in fact used in the environment.