GDPR Compliance and Kubernetes Environments

Learn all about GDPR compliance and Kubernetes environments in 2022

The General Data Protection Regulation (GDPR) is a binding legislative act that aims to protect the personally identifiable data belonging to citizens and residents ("data subjects") of the European Union (EU) while using the Internet on a device (e.g., computer, smartphone). Only personal data belonging to natural persons (living human beings) fall within the ambit of GDPR, which means legal entities, such as corporations, foundations, and institutions are outside its purview. The provisions of this regulation, which came into effect on May 25, 2018, apply to all organizations across the EU, the political and economic union of 27 countries in the European continent.

Essentially, GDPR regulates how an organization established in the EU collects, stores, and uses data pertaining to the data subjects. This is regardless of whether the organization is established outside the EU or such data is stored or processed outside the EU. Any organization offering goods or services, no matter if they are paid or free, is subject to GDPR, and so is any organization engaged in monitoring the behavior of the data subjects. Most importantly, GDPR makes no exceptions for any organization; however, it does relax some of the obligations in case of businesses with less than 250 employees and wherever processing of personal data is not a core activity.

Data Controller and Data Processor

Data controller is a natural person, company, or any other body responsible for determining the purpose for which a certain personal data is collected, processed, and stored as well as procedures used to process it. Data processor, on the other hand, is a natural person, company, or any other body that processes data on behalf and as per the instructions of the data controller.

Key GDPR Principles

Article 5 of GDPR sets down the key principles that should guide organizations when storing or processing personal data:

Principle #1: Lawfulness, Fairness and Transparency

The data must be processed and stored only in accordance with EU's General Data Protection Regulation and other applicable laws. The principle of fairness implies that data must be processed or used only for the purposes and during the time period indicated to the client. Transparency is about keeping the client informed regarding what exactly a business plans to do with her/his data and who will have access to it.

Principle #2: Purpose Limitation

This principle limits storage, processing, and use of personal data to the original purposes for which it was obtained or maybe to a handful of new purposes (e.g., public interest, scientific/historical/statistical purposes) that are in line with the original purpose. The purpose of data collection must be spelled out to the client explicitly and her/his informed consent obtained. Needless to say, the purpose has to be a legitimate one.

Principle #3: Data Minimization

Organizations should collect, process, and retain the smallest possible amount of personal data necessary to achieve the business results they have in mind. The idea is to keep data collection to the bare minimum possible.

Principle #4: Accuracy

The accuracy principle states that organizations must make reasonable efforts to ensure the personal data they collect, store, and process is accurate, corrected/erased where inaccuracies come to light, and kept up to date where required. Individuals have the right to request that incorrect or incomplete personal data be deleted or corrected within 30 days' time. In short, organizations must make sure not to hold on to obsolete personal data, such as, say, out-of-date contact details.

Principle #5: Storage Limitation

GDPR does not prescribe how long different types of identifiable personal data can be stored, but the regulation does require data controllers and data processors to set such limits. As such, organizations must lay down appropriate timeframes for retaining personal data and be able to prove such timeframes to be reasonable.

Principle #6: Integrity and Confidentiality

Data controllers are responsible for ensuring the personal data they collect, store, and process is protected against unauthorized access, accidental loss, destruction, or damage. To this end, they must make use of appropriate technical and organizational measures. These include systems for rendering data anonymous or reducing the linkability of a certain dataset with particular individuals. Businesses must also work toward ISO 27001 certification, since this is demonstrable evidence of their commitment to preventing cybersecurity incidents and improving information security management.

Principle #7: Accountability

This principle holds the controller responsible for compliance with GDPR provisions. Furthermore, it expects the controller to document policies that inform personal data processing and produce the same when requested by the authorities.

Personal Data Under GDPR

Personal data, as defined in Article 4 of GDPR, means any information relating to a living human being who can be identified, directly or indirectly, by reference to one or more of the following identifiers:

Name
Identification number
Location data
Telephone number
Address
Credit card details
Employee number
Customer number
Account data
Vehicle number plate
Physical, physiological, genetic, mental, economic, racial, ethnic, and social identities
Data about political and religious beliefs, trade union membership
Data about health, sex life, and sexual orientation
Any other online identifier or data assigned to a living person

Calculation of GDPR Fines

GDPR has an uncomplicated two-tier penalty structure. Less severe violations (including those relating to Article 8, 11, 25-39, 41, 42, and 43 of the GDPR) could potentially carry an administrative fine of up to 10 million euros or 2% of the organization's annual worldwide revenue for the previous financial year, whichever is higher.

More severe violation of data subjects' privacy (Articles 5, 6, 7, 9, 12-22, 44-49 of the GDPR) and their right to be removed from Internet searches ("right to be forgotten") attract higher penalties. Administrative fines, in this regard, might go up to 20 million euros or 4% of the firm's worldwide yearly revenue from the preceding financial year, whichever is greater. We aren't done yet. Firms might have to cough up additional fines, as per GDPR Chapter IX, for violating data protection laws, if any, of individual EU states. That's not all.

In each EU member state, one or more independent public authorities are mandated with monitoring GDPR implementation. Any failure on the part of organizations to act in accordance with the orders from such bodies is sufficient ground for attracting further huge fines. This, again, is regardless of the original violation. Punitive measures don't end here either. GDPR Article 82 confers on data subjects the right to claim compensation from organizations that cause them material or non-material damage arising out of any GDPR violation!

Here are some best practices to follow to mitigate the risk of GDPR infringements in Kubernetes environments.

Appropriate Access Policies

Kubernetes' role-based access control (RBAC) helps restrict access to Kubernetes API resources based on the roles of individual users within an organization. Kubernetes' information security environment is certainly capable of filtering traffic to and from pods. GDPR only requires processors of personal data to take appropriate measures to secure it. However, organizations will be better off adopting access-token-based authentication, a two-step authentication approach that is sure to strengthen their user validation process. Here, users' login credentials are verified by trusted third-party identity providers (IdPs) on behalf of the service provider. The IDP's audit logging feature maintains a granular record of each transaction handled by it to enable user traceability and meet legal requirements. In addition, a service mesh implementation will help enforce RBAC in a more granular manner over the cluster. This ensure service-to-service authentication and provides detailed measures of service behavior for all communications within the mesh.

Improved Visibility of Services Behavior

A service mesh architecture is increasingly being leveraged to prevent unauthorized access to data shared between microservices or between microservices and the control plane (comprising the API server, scheduler, and controller). The mesh is capable of swiftly identifying traffic that might potentially compromise a cluster.

Service meshes apply mutual TLS (mTLS) method for authentication between services and this feature is implemented via the Envoy high-performance server. This allows fine-grained access control over the service API. As a result, only such personal data is processed as is necessary for each specific purpose. Also, by default, personal data is made accessible only to a finite number of users with a legitimate interest in such data.

Service meshes operate at layer 7 in the OSI networking stack that supports end-user applications. So, fine-grained policies targeting individual users can be automatically generated in this layer. This will allow organizations to keep data subjects informed about what kind of personal data is going to be processed (e.g., email, phone no.), who will be able to collect, store, and process it, and how long it will be retained. This allows data subjects to exercise their right, where required, to object to the processing of their personal data. Where data subjects are in agreement to the processing of their personal data, organizations can obtain their explicit consent via online forms, scanned statements, emails, or e-signatures.

Accompanying each service is a load balancer ("layer 7 proxy"), and all client requests are routed by the proxy to available API servers. And for this reason, it should be easy to generate metrics, logs, and traces for every service within the mesh. This will help data controllers identify and mitigate potential risks to any data subject arising from the processing of her/his personal data. Logging of all API requests and responses will help ascertain which user did what and when in case of any GDPR violation. Under GDPR, the data subject is entitled to receive her/his personal data in a machine-readable format. Since every data point is captured by the service mesh, organizations can meet this requirement without any hassle. In short, a well-thought-out service mesh implementation can meet the requirements under GDPR articles 5, 13, 15, 17, 18, 20, 21, 25, 30, 32, 34, 35, and 46.

In Kubernetes ecosystems, personal data often moves from within containers to an external storage asset (e.g., shared cloud storage, network storage devices), and this ensures data persists even after a pod has gone away. It is critically important that personal data is encrypted during its journey through the container infrastructure and across the Kubernetes network. Security risks and attendant GDPR violations are not limited to such data in motion. Data at rest (static data), i.e., residing on hard drives, might also be subject to cyberattacks. Storing sensitive data on drives even though there is no longer any valid reason to do so is a direct violation of GDPR provisions, and employees might do this uninentionally, if not deliberately. Therefore, it is important for data processors to be doubly sure data at rest is encrypted in etcd, Kubernetes' backing store for all of its cluster data. For this, Kubernetes typically makes use of a key management service (KMS), which simplifies and centralizes the storage and management of encryption keys. Further, the data encryption keys are again encrypted, using a key encryption key, and put away securely into a remotely located storage device.

Regular Security Status Assessment

Container images often have security risks (e.g., malware) embedded in their packages and if the image is used to create a container, the vulnerability might potentially find its way into a live production environment. This could easily expose the project to security threats. So, it is important to review the security posture of container images on a regular basis and identify vulnerabilities using a security platform. This forms part of the "reasonable security" requirement set by GDPR. Furthermore, regular security reviews demonstrate the organization's commitment to protecting the security and privacy of data subjects' personal information, as well as maintaining professional standards for its information systems.

Detailed Audit Trails

GDPR requires organizations to be accountable for the personal data committed to their care. This is where Kubernetes audit logs come in handy to justify the actions of data controllers with respect to data protection and privacy. Kubernetes logs are essentially a chronological listing of all requests made to the Kubernetes API by users and applications. All actions generated by every single user as well as by the API server are stored, and this helps data processors answer audit queries like - "What happened and when?", "Who triggered it?", and "Where was it triggered from and what was its destination?" Suspicious activities, on the other hand, can be identified from DNS records and other logs. On top of this, gathering cluster-level and node-level security insights is particularly significant. Third-party observability programs can query cluster-level data from the API server (control plane) while node-based agents focus on gathering such data at the node level.

We have discussed some of the key considerations when planning GDPR compliance for Kubernetes clusters. Thanks for reading. Hope you enjoyed!

Cyberlands.io Team