Kubernetes authentication project wrestles with migration problems
GKE users like the Workload Identity feature for Kubernetes authentication with GCP services, but similar improvements to core cluster security present a potential market-wide migration headache.
Kubernetes has not only matured, it’s showing a wrinkle or two.
In case there was any doubt that Kubernetes is no longer a fledgling project, it has begun to show its age with the accumulation of technical debt, as upstream developers struggle to smooth the migration process for a core set of security improvements.
Kubernetes authentication and role-based access control (RBAC) have improved significantly since the project reached version 1.0 four years ago. But one aspect of Kubernetes authentication management remains stuck in the pre-1.0 era: the management of access tokens that secure connections between Kubernetes pods and the Kubernetes API server.
Workload Identity for Google Kubernetes Engine (GKE), released last month, illuminated this issue. Workload Identity is now Google’s recommended approach to Kubernetes authentication between containerized GKE application services and GCP utilities, because Workload Identity users no longer have to manage Kubernetes secrets or Google Cloud IAM service account keys themselves. Workload Identity also manages secrets rotation, and can set fine-grained secrets expiration and intended audience policies that ensure good Kubernetes security hygiene.
GKE users are chomping at the bit to use Workload Identity, because it could eliminate the need to manage this aspect of Kubernetes authentication with a third-party tool such as HashiCorp’s Vault, or worse, with error-prone manual techniques.
“Without this, you end up having to manage a bunch of secrets and access [controls] yourself, which creates lots of opportunities for human error, which has actually bitten us in the past,” said Josh Koenig, head of product at Pantheon Platform, a web operations platform in San Francisco that hosts more than 200,000 Drupal and WordPress sites.
Koenig recalled an incident where a misconfigured development cluster could have exposed an SSL certificate. Pantheon’s IT team caught the mistake, and tore down and re-provisioned the cluster. Pantheon also uses Vault in its production clusters to guard against such errors for more critical workloads, but didn’t want that tool’s management overhead to slow down provisioning in the development environment.
“It’s a really easy mistake to make, even with the best of intentions, and the best answer to that is just not having to manage security as part of your codebase,” Koenig said. “There are ways to do that that are really expensive and heavy to implement, and then there’s Google doing it for you [with Workload Identity].”
Kubernetes authentication within clusters lags Workload Identity
Workload Identity uses updated Kubernetes authentication tokens, available in beta since Kubernetes 1.12, to more effectively and efficiently authenticate Kubernetes-based services as they interact with other Google Cloud Platform (GCP) services. Such communications are much more likely than communications within Kubernetes clusters to be targeted by attackers, and there are other ways to shore up security for intra-cluster communications, including using OpenID Connect tokens based on OAuth 2.0.
However, the SIG-Auth group within the Kubernetes upstream community is working to bring Kubernetes authentication tokens that pass between Kubernetes Pods up to speed with the Workload Identity standard, with automatic secrets rotation, expiration and limited-audience policies. This update would affect all Kubernetes environments, including those that run in GKE, and could allow cloud service providers to assume even more of the Kubernetes authentication management burden on behalf of users.
“Our legacy tokens were only meant for use against the Kubernetes API Server, and using those tokens against something like Vault or Google poses some potential escalations or security risks,” said Mike Danese, a Google software engineer and the chair of Kubernetes SIG-Auth. “We’ll push on [API Server authentication tokens] a little bit because we think it improves the default security posture of Kubernetes open source.”
There’s a big problem, however. A utility to migrate older Kubernetes authentication tokens used with the Kubernetes API server to the new system, dubbed Bound Service Account Token Volumes and available in alpha since Kubernetes 1.13, has hit a show-stopping snag. Early users have encountered a compatibility issue when they try to migrate to the new tokens, which require a new kind of storage volume for authentication data that can be difficult to configure.
Without the new tokens, Kubernetes authentication between pods and the API Server could theoretically face the same risk of human error or mismanagement as GKE services before Workload Identity was introduced, IT experts said.
“The issues that are discussed as the abilities of Workload Identity, to reduce blast radius, require the refresh of tokens, enforce a short time to live and bind them to a single pod, would be the potential impacts for current Kubernetes users” if the tokens aren’t upgraded eventually, said Chris Riley, DevOps delivery director at CPrime Inc., an Agile software development consulting firm in Foster City, Calif.
Kubernetes authentication upgrade beta delayed
After Kubernetes 1.13 was rolled out in December 2018, reports indicated that a beta release for Bound Service Account Volumes would arrive by Kubernetes 1.15, which was released in June 2019, or Kubernetes 1.16, due in September 2019.
However, the feature did not reach beta in 1.15, and is not expected to reach beta in 1.16, Danese said. It’s also still unknown how the eventual migration to new Kubernetes authentication tokens for the API Server will affect GKE users.
“We could soften our stance on some of the improvements to break legacy clients in fewer ways — for example, we’ve taken a hard stance on file permissions, which disrupted some legacy clients, binding specifically to the Kubernetes API server,” Danese said. “We have some subtle interaction with other security features like Pod Security Policy that we could potentially pave over, by making changes or allowances in Pod Security Policies that would allow old Pod Security Policies to use the new tokens instead of the legacy tokens.”
Widespread community attention to this issue seems limited, Riley said, and it could be some time before it’s fully addressed.
However, he added, it’s an example of the reasons Kubernetes users among his consulting clients increasingly turn to cloud service providers to manage their complex container orchestration environment.
“The value of services like Workload Identity is the additional integration these providers are doing into other cloud services, simplifying the adoption and increasing overall reliability of Kubernetes,” Riley said. “I expect them to do the hard work and support migration or simple adoption of future technologies.”