NIST CSF and Kubernetes for Microservices Environments

Learn all about NIST CSF and Kubernetes for microservices environments

To help organizations better understand, manage, and reduce cybersecurity risks, the US National Institute of Standards and Technology (NIST) came out with the Cybersecurity Framework (CSF) in 2014 as a voluntary guidance. The physical sciences lab under the US Department of Commerce worked with the private sector to develop this set of guidelines and best practices.

CSF offers guidelines that are straightforward and readily understood, and, in any case, there is no alternative framework in sight. CSF is more than just about securing systems, network, and data from security breaches arising from cyberattacks and human error. The minimum security baseline set down by CSF is in agreement with the Health Insurance Portability and Accountability Act (HIPAA) requirements. NIST standards, in general, represent a combination of non-binding guidelines and standards that government agencies must follow in order to comply with the Federal Information Security Management Act (FISMA) mandates. In that sense, CSF is a stepping stone towards meeting the requirements under other key compliance regulations.

Public sector players outside the US have already seized on the opportunity presented by CSF to implement information security and cyber risk management in their organizations. Japan and Israel, among others, are reportedly making good use of CSF. Starting 2017, CSF is mandatory for all federal agencies as well as contractors and subcontractors who work with such agencies. Disincentives for non-compliance include disqualification from government contracts in the future. While CSF is voluntary for businesses in the private sector, a lot of decision makers in industry realize that embracing at least some features of the framework is in their own best interest. Today, nearly 50% of all US organizations adhere to CSF, up from 30% in 2015.

Core Functions

The framework recommends five key cybersecurity functions on the part of organizations desirous of mitigating risks to their hardware, software, and data.

Identify: In this function, organizations identify the systems and platforms ("crown jewels") that are critical to their day-to-day operations and the potential risks to these assets. Such core assets include repositories that store structured and unstructured data, other databases, object-based storage solutions especially for the cloud, and closely guarded proprietary source codes.

Protect: This is the logical next step in which organizations adopt proactive measures such as allocating the resources required to secure critical assets identified in the previous step. The key objective here is to ensure delivery of critical services and limit any fallout from potential cybersecurity incidents. Deployment of firewalls and tools to secure endpoints and contain privilege-escalation attempts form part of this step.

Detect: This step is about putting tangible safeguards in place to spot activities that threaten the confidentiality, integrity or availability of various organizational information systems. That means deploying tools for detecting threats based on malicious instruction sequences as well as suspicious domain names, IP addresses, and file hashes. Many cybersecurity solutions also help in orchestrating preventive actions, in addition to logging, reporting, and analyzing cybersecurity events.

Respond: This step is concerned with building adequate response processes to meet cyberthreats in a timely manner, so as to eliminate or mitigate its impact and limit the malware spread. Response tools for dealing with malicious files, infected host machines/networks, and lateral movements by attackers come into play in this step as also unauthorized access of user accounts.

Recover: This step involves creating an enabling system to help recover from the cyberthreat and keep critical business functions up and running during and after a cybersecurity event.

Let's now consider the CSF-recommended core functions, previously discussed, specifically in the context of Kubernetes.

#1 Identify

NIST CSF guidelines recommend that organizations running Kubernetes maintain an inventory of Kubernetes node machines (clusters) and applications. This is taken care of by the Kubernetes configuration management database (CMDB), a file which reflects the practices for regulating and maintaining containerization software. It also captures various configuration data that describe clusters, users, and contexts. In addition, CMDB contains information on storage volumes accessible to containers in a pod. Thus, CMDB makes information on the Kubernetes platform components available at a single place and provides a centralized view of such assets.

Kubernetes misconfigurations could potentially open up the platform to bad actors. Use of default namespace in shared clusters, running containers in privilege mode, and allowing unencrypted traffic between the Kubernetes API server and kubelets are some of the most common misconfigurations. Providing security training to uses at all levels, (e.g., managers, senior executives, contractors), will help reduce risks arising from misconfiguration and human errors.

Running the latest supported release of Kubernetes, patched for newer vulnerabilities, as well as up-to-date third-party scanning tools is of paramount importance. Trivy, an open-source offering, is a security and misconfiguration scanner rolled into one. It is capable of scanning container images, file systems, Git source-code repositories, OS packages, and language-specific packages for known vulnerabilities. Trivy also surfaces configuration issues that might put the Kubernetes infrastructure itself at risk of cyberattacks. Scanning repositories and Kubernetes artifacts for exposed secrets (e.g., passwords, API keys, token, other sensitive information) is enabled by default on Trivy.

Falco open-source threat detection engine is ideal for cluster workload monitoring since it is capable of continuously reporting signs of abnormal behavior and harmful activities in running containers. Some example of abnormal and potentially malicious user activities are unexpected outbound network connections from clusters to any specific IP or domain and undesirable attempts to write (change) data. Others include privilege escalation and changes to the namespace. Falco comprises an engine that monitors system calls in the Linux kernel and runs instances of suspicious behavior past Falco's security rules as well Kubernetes audit logs. Where a rule violation is noticed, the engine generates a security alert.

The Kubernetes environment routinely leverages third-party applications and code libraries, and malicious users could exploit backdoors in such programs to gain deeper access to the system. Therefore, third-party partners must be assessed from time to time to eliminate software supply chain risks. It also pays to sign up for specialized security bulletins that provide contextual information on vulnerabilities and latest security patches. Above all, the organizational cybersecurity policy must be communicated unambiguously to all employees, so they know how to handle security events when they occur.

#2 Protect

Using effective access control policies for PII (e.g., name, address, social security number, telephone number, email address) and SPI (e.g., biometric data, gender, race, height, weight, sex, trade union membership, and sexual orientation) is a crucial part of this step. Kubernetes API server limits individual users' ability to access computers or network resources and perform various tasks based on their roles within an organization (role-based access control - RBAC).

In actual practice, it is important to ascertain whether users who call up APIs associated with sensitive data are indeed who they say they are before they are granted access. The authentication is carried out by an identity provider (IdP) on behalf of the service provider and the IdP maintains an audit trail of every transaction it handles. Further, Open Policy Agent (OPA) policy engine can be used to enforce more granular policies, involving access to sensitive data. This goes beyond what RBAC can hope to achieve. Working together, RBAC, IdP service, and OPA can serve as a reliable approach to ensure sensitive information is available to users strictly on a need basis.

NIST CSF urges organizations to identify potential vulnerabilities in assets and document how they plan to remedy these issues. For instance, Kubernetes "Namespaces" provide a mechanism for isolating multiple users or applications using a single shared cluster, and this is particularly significant in scenarios involving a large number of users, teams, and projects. The shared cluster is partitioned into multiple virtual clusters, and strict rules are enforced around, say, who can or cannot create or modify pod instances. The result is that each user or application exists within its Namespace, completely secluded from other users and applications. This helps check unintended interactions between users/applications. Besides, since Kubernetes operates on the principle of least privilege, users and applications only have access to the resources they need to carry out their operations.

Mutual transport layer security (mTLS), in which the client identity is also authenticated in addition to the credentials of the server, is especially suited for securing cross-application communication in Kubernetes. The mTLS process ascertains whether an authenticated call is indeed from a client application that is permitted to access the data in question. Besides, Kubernetes network policy makes it possible to allow or block traffic flow between pods at the IP address/port level.

For added protection, Kubernetes provides for frequent rotation of cluster credentials, such as the private key for the cluster root certificate authority (used to authenticate the API server and by the API server to validate the kubelet client certificate). Credential rotation also changes the IP address used by the control plane to provide service or information to the Kubernetes API. It's best to rotate application credentials on a daily or at least monthly basis.

Kubernetes encrypts data at rest (stored data) in etcd, a key-value database, making use of a key management service (KMS), an add-on that runs as a static pod on a master node. The data encryption keys are further encrypted using a key encryption key and stored away securely in a remote KMS.

#3 Detect

Kubernetes is capable of capturing logs from applications and other components like API server, scheduler, etcd, Kube-proxy, and kubelets. Audit logs could prove handy since these capture a chronological set of requests made to the Kubernetes API. Logs include pertinent details (e.g., Who/what issued a request, what was the request for, and the results thereof). Certainly, audit logs help detect unusual activities, but they fall short of alerting or continuously monitoring anomalous events or trends. Third-party tools come to the rescue here.

#4 Respond

Being able to detect unusual activities and receive timely automatic alerts on potential security risks is only half the battle. This needs to be followed up with adequate response strategies. To develop and deploy highly effective responses, organizations needs a security operations center (SOC) capable of automatically sifting false positives about vulnerabilities. The SOC should be capable of investigating real alerts arriving from detection systems and remediating them using the very latest cybersecurity tools.

Hashicorp vault, a secrets management tool, can ensure secrets (e.g., usernames, passwords, database credentials, API tokens, TLS certificates) are encrypted while at rest inside the vault as well as during transit between the vault and clients. Secrets are stored in a centralized location. This eliminates the risk of such sensitive information being all over place in plain text for anyone in the organization to see so long as they have access to GitHub/GitLab! Hashicorp allows access control to be defined in a much more finegrained manner, like, say, "Web server needs access to database authentication information" or "API server requires API tokens."By integrating a Vault Certificate Authority (CA) with a service mesh, certificates can be issued for legitimate applications running in the mesh.

#5 Recovery

The incident recovery plan is executed during and after a data breach or a cyberattack. Employees at all levels need to be aware of what post-incident activity is expected of them in terms of recovering data, where required, and enabling normal business operations. In case of non-critical apps, users might just tolerate a higher recovery time objective (RTO - or acceptable amount of time that a service or application can remain unavailable during a disaster as set down in the service-level agreement). When it comes to mission-critical apps, such as a real-time healthcare system for physicians, customers might expect an RTO of less than 5 mins! For the same app, the tolerable recovery point objective (RPO – the extent of data loss that is acceptable for an organization following a disaster) could be 0.

Backing up data is of paramount importance to ensure RTO/RPO stay within acceptable limits, and there are handy open-source backup tools specifically built with the Kubernetes environment in mind. KubeDR tool, for instance, automatically backs up cluster configuration data, its state, metadata and certificates. Velero, a backup and disaster recovery tool, provides for scheduled backups of an entire cluster or individual namespaces or labels. Kubernetes objects, which indicate the state of clusters, are backed up in object storage.

Summing Up

Those are some cybersecurity tips to help you enforce stronger NIST CSF compliance in Kubernetes environments. See you soon. Thanks for your time.

Cyberlands.io Team