I have been working in a team where we use kubernetes in production (not the nginx example- the real shit) for 2 years now. I have configured and used Kubernetes clusters from version 1.4.x with tools like kube-aws to 1.6-1.7 configured with kops. Amazon's EKS is the third breed of kubernetes provisioning solutions that I have the chance to try and this post is about my recent experiences for a week, trying to bring a production level EKS into life and check if it would cut it for our production needs.
This post would not have been possible without the contribution and hard work of my colleague JV - thanks!!!
For those who seek an executive summary of EKS. Its an AWS managed Service (like for example your Amazon Elastic Cache). Amazon provisions, updates and patches the brains of your cluster, aka the control planes + etcd. There is kind of a flat rate (price) for the masters + EC2 standard billing for your worker fleet. AWS also provides a custom networking layer, eliminating the need to use any additional overlay network solutions like you would do if you create the cluster on your own. You are responsible for provisioning and attaching - the worker nodes. AWS provides templates (Cloud-formation) with pre-configured workers. You are responsible for installing on top of the cluster all the other services or applications that are needed by your platform e.g how to collect logs, how to scrape metrics, other specific daemons etc. Also make note that once the cluster is up, there is nothing AWS specific, you get a vanilla experience (exception is the networking plugin).
How do I start?
There are a couple of options for spinning an EKS cluster
- The infamous click click on the dashboard (its good if you want to play but not production ready, meaning if you want to re-provision and test)
- Go through the official guide of EKS installation using command like tools like aws eks etc. Its a good option especially if you love the aws command line tooling.
- Use third party command line tools that offer behind the scenes extra functionality, namely things like eksctl . It's a very promising tool by the way.
- Terraform all the things!
We started the PoC with option 4.1 . So we used the official terraform guide (thank you Hashicorp) and then the worker provision was terraformed as well. So we did not keep the standard cloudformation extract from AWS. As you can understand, the tool of choice sometimes is dictated by the available levels of skills and experience within the team. In general we love terraform (especially us the developers) .
Other things to consider before I start?
So, as we discovered and of course it was very well documented, an EKS cluster due to the networking features that it brings (more on this later), really shines when it occupies its own VPC! Its not that you can not spin an EKS cluster on your existing VPCs but make sure you have enough free IPs and ranges available since by default the cluster - and specifically the workers, will start eating your IPs. No this is not a bug, its a feature and it actually makes really sense. It is one of the things that I really loved with EKS.
First milestone - spin the masters and attach workers
The first and most important step is to spin your masters and then provision your workers. Once the workers are being accepted and join the cluster you more or less have the core ready. Spinning just masters (like many articles out there feature is like 50% of the work). Once you can create an auto-scaling group where your workers will be created and then added to the cluster - this is like very close to the real thing.
Coming back to the Pod Networking feature
If you have ever provisioned a kubernetes clusters on AWS, using tools like kops or kube-aws, then you most probably have already installed or even configured the overlay network plugin that will provide pod networking in your clusters. As you know, pods have IPs, overlay networks on a kubenretes cluster, provide this abstraction see (calico, flannel etc). On an EKS cluster, by default you don't get this overlay layer. Amazon has actually managed to bridge the pod networking world (kubernetes networking) with its native AWS networking. In plain words, your pods (apps) within a cluster do get a real VPC IP. When I heard about this almost a year ago I have to admit I was not very sure at all, after some challenges and failures, I started to appreciate simplicity on the networking layer for any kubernetes cluster on top of AWS. In other words if you manage to remove one layer of abstraction, since your cloud can natively take over this, why keep having one extra layer of networking and hops where you can have the real thing?
But the workers pre-allocate so many IPs
In order EKS optimize Pod placement on the worker, uses the underlying EC2 worker capabilities to reserve IPs on its ENIs. So when you spin a worker even if you there are no pods or daemons allocated to them, you can see on the dashboard that they will have already pre-allocate a pool of 10 or depending on the class size, number of IPs. If you happen to operate your cluster on a VPC with other 'residents' your EKS cluster can be considered a threat! One way to keep the benefits of AWS CNI networking but make some room on VPCs that are running out of free IPs is to configure- after bringing up the masters - the 'aws-node' deamon set. This is an AWS specific deamon part of EKS magic that make all this happen. See here for a similar issue. So just
kubectl edit deamonset aws-node -n kube-system
and add the `WARM_IP_TARGET` to something smaller.
Make note as we discovered, setting the WARM IP TARGET to something smaller, does not limit the capacity of your worker to host more pods. If your worker does not have WARM IPs to offer to newly created and allocated pods will request a new one from the networking pool.
In case that that even this work around is not enough - then there is always the options to switch on calico on top of the cluster. See here. Personally after seeing CNI in action I would prefer to stick to this. After 2 years with cases of networking errors, I think I can trust better AWS networking. There is also the maintenance and trouble shooting side of things. Overlay networking is not rocket science but at the same time is not something that you want to be spending time and energy trouble shooting especially if you are full with people with these skills! Also the more complex your AWS networking setup is, the harder it becomes to find issues when packets jump from the kubernetes world to your AWS layer and vice versa. It is always up to the team and people making the decisions to choose the support model that they think fits to their team or assess the capacity of the team to provide real support on challenging occasions.
Apart from appreciating the simplicity of CNI I really found very straight forward the integration of EKS with the existing IAM infrastructure. You can use your corporate (even SAML) based roles / users of your AWS account to give or restrict access to your EKS cluster(s). This is a BIG pain point for many companies out there and especially if you are an AWS shop. EKS as just another AWS managed service, follows the same principles and provides a bridge between IAM and kubernetes RBAC!. For people doing kubernetes on AWS, already know that in the early days, access to the cluster and distribution of kube configs - was and still is a very manual and tricky job since the AWS users and roles mean nothing to the kubernetes master(s). Heptio has done a very good job with this.
What is actually happening is that you install the aws-iam-authenticator and attach it to your kubectl , through ./kube/config. Every time you issue a command on kubectl, it is being proxied by the aws-iam-authenticator which reads your AWS credentials (./aws/credentials) and maps them to kubernetes RBAC rules. So you can map AWS IAM roles or Users to Kubernetes RBAC roles or create your own RBAC rules and map them. It was the first time I used this tool and actually works extremely well! Of course if you run an old kubernetes cluster with no RBAC it wont be useful but in the EKS case, RBAC is by default enabled! In your ./kube/config the entry will look like this.
- name: arn:aws:eks:eu-west-1:
- name: AWS_PROFILE
Make note that, from the EKS admin side you will need to map the
kubectl edit configmap aws-auth -n kube-system
- system:masters #to SOME KUBERNETES RBAC ROLE
What about all the other things that you have to install?
Once the cluster is ready so you have masters and workers running then the next steps are the following, and can be done by any admin user with appropriate `kubectl` rights.
- Install and configure Helm.
- Install and configure the aws-cluster-autoscaler. Which is more or less straight forward, see here and here for references.
- Install and configure fluentD to push logs e.g to Elastic Search
- Install and configure Prometheus.
- And of course..all the things that you need or have as dependencies on your platform.
Should I use EKS?
- If you are an AWS user and you have no plans on moving away, I think is the way to go!
- If you are a company/ team that wants to focus on business delivery and not spend a lot of energy keeping different kubernetes clusters alive, then YES by all means. EKS reduces your maintenance nightmares and challenges 60-70% based on my experience.
- If you want to get patches and upgrades (on your masters) for free and transparently - see the latest kubernetes security exploit and ask your friends around, how many they were pushed to ditch old clusters and start over this week (it was fun in the early days but it is not fun any more). So I am dreaming of easily patched clusters and auto upgrades as a user and not cases like - lets evacuate the cluster we will build a new one!
- Is it locking you on a specific flavour? No the end result is a vanilla kubenetes, and even that you might be leveraging the custom networking, this is more less the case when you use a similar more advanced offering from Google (which is a more complete ready made offering).
- If you have second thoughts about region availability, then you should wait until Amazon offers EKS on a broad range of regions, I think this is the only limiting factor now for many potential users.
- If you already have a big organization tightly coupled with AWS and the IAM system - EKS is the a perfect fit in terms of securing and making your clusters available to the development teams!
Overall it was a very challenging and at the same time interesting week. Trying to bring up an EKS cluster kind of pushed me to read and investigate things on the AWS ecosystem that I was ignoring in the past.