Saturday, March 14, 2020

Ah these wallpapers...

 original photo here.

I remember the old days where I used to collect HD wallpapers. Spending hours and hours on different sites (do you remember devianart), trying to find stuff that I like and making sure this 'wallpapers' folder was always backed-up. I had multiple copies of it on CDs and then DVDs and then somewhere online. It is kind of funny how our taste changes over the years and that is reflected to our wallpapers collection as well.

Kind of similar effect like the one we have with our tast for clothes as we grow older. Anyway, its been like years now that I have abandoned that 'wallpapers' folder. I must have deleted it actually from the last resort backup - some Amazon S3 folder, all these pics were not me anymore. I also got tired, I dont have so much free time either, think we all have better things to do.

Here is a thing though, when I was young enough and I had enough free time to spend on wallpaper collection (like many of you did, I know that) I could not afford a good or big monitor. Now that I can afford a big curved HD Samsung monitor and I really love it, I really need some nice backgrounds. Damn it!

One thing that changed over the years is that I kind of funcy change, like fck yeah change this desktop wallpaper every few hours why not , as long as it is a decent image of some sort. The old days where my wallpapers were only HD photos of fighter jets are long gone - but I do miss sometimes my Mirage 2000 over the Aegean Sea collection.


For a long time I used to rely on Kuvva Wallpapers. I loved it, actually I was mind blown, so many nice pics and the app was so nice. I think at some point the app stopped being free  or something happened (i can not remember) and I had to pick something else.

In the past one or two years I have been using Unsplash Wallpapers on the Mac and for my Windows10 Gaming NUC - which I regurarly find myself updating it and stuff - I discovered Splashy for Windows. I think they work both very nice and they deliver good wallpapers.

If you have any recommendations of similar Wallpaper Rotation apps - leave a comment here or on twitter @javapapo.



Thursday, January 30, 2020

tfenv - a must have terraform version manager!

As a mac user and active developer I use a lot of package managers, update managers etc.

Most of the people would be familiar with things like:
  1. homebrew
  2. sdkman (for JVM developers)

If you happen to be like me and use Terraform, then I have only nice things to say about tfenv. Not sure if you have ever struggled to jump around different terraform stacks,that had different terraform version requirements. Hopefully this challenge will gradually fade out with the wide adoption of 0.12 but is still a case. Maybe a stack is still using 0.11 and your new stuff 0.12. Maybe you found that there is a bug on a minor version for your aws provider and you want to re-run your stack with the previous minor version?

https://github.com/tfutils/tfenv

A true productivity gem :)  - deserves a star!


Saturday, January 18, 2020

Διαβάζει κανείς blog; ποιός ξέρει.

Αναρωτιέμαι πολλές φορές αν ο κόσμος συνεχίζει να διαβάζει blog. Καταραμένη Google όταν μας έκοψες το Google Reader χάλασες όλη την φάση. Νιώθω ενοχές που δεν γράφω τόσο συχνά. Το έχω γράψει και παλιά, όλα αυτά τα χρόνια έχουν περάσει πολλά post από το μυαλό μου και συνεχίζω να μην τα γράφω. Eλπίζω ότι το 2020 θα είμαι πιο παραγωγικός, τα ίδια έλεγα και την περασμένη χρονιά.

Το 2019 ήταν μια αρκετά περιπετειώδης χρονιά για μένα, ιδιαίτερα επαγγελματικά! Θέλω να γράψω κάποια στιγμή μερικά πράγματα αλλά επειδή ο κόσμος είναι μικρός ακόμα και εδώ στο Λονδίνο θα πρέπει να τα φιλτράρω κάπως. Το 2020 μοιάζει να ξεκινάει αντίστοιχα, θα δούμε. Αναρωτιέμαι αν διανύω μια  κρίση ταυτότητας - επαγγελματική.

Για την ώρα αποφάσισα να κλείσω οριστικά τον λογαριασμό μου στο Facebook και σκέφτομαι μαζί με την βοήθεια και παρέα φίλου να προσπαθήσω να επαναφέρω το papocast ή weekend geeks ή όπως άλλο το πούμε.  Δεν έχω ιδέα αν ο κόσμος ακούει πια podcast πόσο μάλλον ελληνικά podcast.  Vrypan παραμένω μελαγχολικά κολλημένος στην περίοδο που ήμουν ακροατής σου - μάλλον το μόνο ελληνικό podcast που άκουγα. 

Δειλά δειλά έχω ξεκινήσει να ακούω κάποια τεχνολογικά podcast ξανά

Eπιτέλους δίνω μια ευκαιρία στα audio books , συγκεκριμένα με το audible. Επιτέλους βρήκα μια καλή παρέα στα αργά και επίπονα για την ώρα (καταραμένη δίαιτα) τρεξίματα στο παρκο. Το πρώτο βιβλιο που φαίνεται ότι θα τελειώσω σύντομα δεν είναι άλλο από το Permanent Record του E.Snowden. Εχω κατά ενθουσιαστεί με την όλη εμπειρία του audio book αλλά και τον ίδιο τον Snowden.

Τέλος, ανακάλυψα τυχαία μια μπάντα που μου θυμίζει αρκετά τους αγαπημένους Opeth , πάω στοίχημα οι γνώστες τους ξέρουν ήδη. Soen.

 Χαιρετισμούς απ' το δροσερό Λονδίνο.

Thursday, August 08, 2019

Back into gaming / flight sims and PCs - entering the NUC world

 

Historically 

I started as a PC user back in the days, Windows user specifically. I can still remember my self assembling PCs, reading magazines being an expert on motherboard trends, CPUs, heat sinks and all these stuff. I can also remember my self configuring all flavours of Windows...and almost 14 years ago I quit. I had enough. I switched to Apple's computers and never looked back. Well...sort of. As a professional when I switched to Apple's MacOSX and Apple's hardware I felt bliss. Yes there are some fck ups here and there, yes MacOSX is not being evolved or keep up but still is a pretty solid OS along with some pretty solid hardware the combination is pretty good. Currently I can not see how I can switch to another platform, even though I have been closer and closer to Linux over time. One thing that I missed when I moved away from PCs was gaming. I am getting old and I am this type of guy that believes that all games should be played with a keyboard and mouse :P . Yes I have a PS4 I tried several times to convince myself that playing Call of Duty on PS4 is the nice and while the graphics are not bad at all, I miss the comfort of my desk, my desktop screen and the A-S-W-D combo! So occasionally  I would tap back in to the PC world, either assembling or kind of buying some half baked tower, so that I can enjoy my favourite games. When I was too busy I was kind of forgetting of this PC and when I had some spare time I would try to re-install some flight simulator or a new version of CoD. The thing is that since my PC was not my main machine it was always considered a side-kick a nice to have. No upgrades no love nothing.

 

Lately

Definitely my free time is not improving but I could find my self spending considerable amount of time watching on YouTube videos of this flight simulator (its hardcore) called DCS. Specifically the videos here and here, where they get to explain what is going on + the graphics. Ooh my, I was hooked. Flight simulators were always my soft spot, I still remember trying to read the F-16 manual from Falcon 3.0 . After some time watching and watching this little daemon whispered in my dreams - you know you should buy again a PC. But... I gave away the previous one I replied, I was not using it is going to be a waste. 'No you need it..you need to try flying a Mig29 or your favourite Mirage2000'. Oh well, yes I need it! I need all the things kind of thing to be able to play this flight simulator and start dogfighting (so little I knew).


What kind of PC?

Obviously, I did not want to go into the lets build or configure a proper tower. I had actually gave away my beloved iMac, to my father back home. So the idea of bringing a tower back to my desk was a no go, also there is no space. Renting a flat in London does not allow you to expand in terms of things to have you should always prepare for your next move, until you manage to buy or secure a flat that you can consider your own! Anyway, no towers, no prebuilt stuff from regular home PC vendors so what do we do? This is where I started reading about NUCs.  Intel's form factor bare-bone mini PCs. I was super excited. I remember their early days where the whole scene was most about small machines but totally underpowered. After all I wanted to play games mostly and already had a mini PC (some Chinese brand using it as a TV sidekick).  It seems that there are lots of different brands and options if you want to enter this world of small but powerful machines. Actually some of them can be used as servers and business operations. Have a look here for example.

Still, I was into 'I want a small powerful machine but I dont want to spend a lot of time tuning it'. After all I was like, ok I don't have enough time, I am not going to spend all my free time on configuring the machine I want something that I will just install the game and start playing (little I knew again lol).

Intel Hades Canyon!

I've watched some YouTube reviews and was kind of impressed, I was like ' yes this is it'. I did try to check on other offerings kind of get a base NUC skeleton and then configure it but mentally I was on this path of  'Get me something that works out of the box' . So I opted for this - Intel Hades Canyon .


It has a pretty interesting hybrid CPU/GPU  combo  specifically this, that would make hard core PC users or gamers go like 'No don't do this, you can not upgrade etc' . Valid comments but I felt like is good enough for my needs, after all my flight sim of choice was not the most demanding game out there. This iCore7 + Radeon Vega combo performs very well on the majority of games that I tried till now and obviously is doing very well with my flight sim. 32GB or ram gives plenty of space + 500 GB SSD is also enough for me.

The very first day I switched it on, I was a bit shocked by some weird noise from an internal fan, but soon after I did upgrade the motherboard BIOS and drivers it was gone. The machine is actually pretty silent but obviously there will be times if you push it that you will hear it roar - which is totally acceptable. In terms of size is actually 1/3 ? of your PS4, so it feeds under your screen on in your desk. It looks like an old school box for your TV from the 90s.


Ooh...Windows!

Overall I am super happy with my Intel NUC and its performance, the one thing I can not stand (apologies I am bit biased) is the operating system. Over the years I have totally disconnected from the MS Windows scene, I used to know all the menus and shortcuts and utilities now I am like finding my self googling 'How I can apply a Windows 2000 or Windows XP theme to Windows 10'. I don't want to be negative, anyway I am not using professionally Windows for the past 15 years so I really don't know if the OS got better (I guess it did? kind of?) but at the same time I don't feel like spending time and energy exploring it further. The only thing that I have on my bookmarks is this upcoming upgrade to Windows 10 with this new native Linux kernel support - WSL 2. Maybe Windows can be the next MacOSX, if they totally embrace a ****x kernel and a better UI , who knows.


DCS time...time to fly...we more or less time to read

So the PC was ready, the game was installed and then it was time to fly. Even though I had watched a significant number of videos and reviews I had not totally grasped how complex it is to engine start a fighter plane, or fly it properly (without stalling and crashing within 5 minutes) ...needless to say how difficult it is to dogfight. My first problem was that I thought I knew how to fly a plane (on a simulator). I was wrong, it is much more complex and the amount of work that a fighter pilot has to put, especially on 3rd generation fighters is a lot. So instead of playing I actually found my self reading about Basic Fighter Manoeuvres, GPS and ILS landing, missile capabilities and radar tracking modes!

DCS as a game (its core) is totally free and it comes with 2 fighter models pre bundled. But when you get to see the videos you will see that most of the cool fighter jets that you grew up with are modules that you need to purchase online. Some fighter modules are bundled in 1 offering (see Flaming Cliffs bundle) Well let me tell you its totally worth it, I still remember when I bought the Mirage2000 module! I was so excited. Finally a very realistic model of my favourite fighting machine. The funny part is that I never managed to start the Mirage within the first week, I had to watch 3 different videos and manuals so that I start the engines properly and initialize the INS ! LOL. 

After some days and since I opted to fly  the Mig29-S and the SU-27/35 I managed to get better, but there is always a but. Once I started to understand how to navigate in the air, how to kind of land and how to kind of follow orders from the ATC, it was time to dogfight. I did not dare to go online, I tried it once and I was shot down within 30seconds of fly, some AMRAAMs were fired from people on the server and I was like dead..my Radar Warning System went crazy when I entered the server and then booom!

But is to be expected, thankfully the game has a PVC mode and several missions so I could sit back and train with the computer as the enemy. After some more days, I realized that it was impossible to dogfight using only the keyboard and keys. Some moves were almost impossible to make them using the Up and Down arrows, so this led me to the next purchase.



Thurstmaster 16000M - Joystic / HOTAS


I managed to resist my temptation to buy something more complex (I always do that and is bad). So since the game was complex and since I really needed a basic joystic and throttle to automate my workload I red through the different forums. It seems that many people suggested to start with an OK and not over complex combo like the Thurstmaster 16000M which is more than OK not only for noobs but for more complex scenarios. I did not buy the (pedals) which is fine, but once I wired the joystic and configured the throttle and the 16 differrent buttons, wow it was like playing a different game!

For the first time I managed to escape a dogfight 6 o'clock lock from an F-14 by performing some basic overshoot moves and shot it down with my MiG-29. I managed to get the tail of an F-16 with my Mirage (fck yeah) . So yes if you want to go into fighting with DCS then you will end up buying a combo. The quality is more than fine and the number of switches is more than enough for you to configure. Currently I use more or less half of them.
 

Overall

It all started from watching youtube videos, sparking again my flight simulator love. This led me to PCs once again which was kind of fun. I think spent more time reading about on how to actually fly a plane rather than dealing with the PC or Windows or anything else and this was a success. If you are into flying and you really want something that is more than a game I definitely recommend DCS!

Fights on!

Sunday, June 23, 2019

Configuring and using AWS EKS in production - round 2 #kubernetes #aws



Its been some weeks now that our migration to Amazon EKS (work place) is completed and the clusters are in production. I have written a brief in the past on some major points, you can find it here. With some extra confidence while the system is serving real traffic I decided to come back for a more concrete and thorough list of steps and a set of notes I gathered through this journey. Obviously there are several companies out there that have been using Amazon's Kubernetes service, so this post aims to be just another point of reference for EKS migration and adoption uses cases.

 

Platform - a web platform

The overall platform is a powering a website (e-store), the EKS clusters operate on a active-active mode, meaning they share load and are utilized accordingly based on weighted load-balancing. Cluster load balancing - if we can call it that way is performed on the `edge`, so no kubernetes federation concepts for the time being. The total amount of accumulated compute in terms of CPUs, is somewhere between 400-600 cores (depending  on the load). The total amount of micro services powering the platform are in the range of 20-30, mostly Java payloads and a mix of node (Js Node based). The platform is in expanding state system entropy is increasing by adding more pieces to the puzzle in order to cover more features or deprecate legacy  /older systems.

The website is serving unique page views in the range of half of million daily (accumulated 15 markets - across Europe, UK and APAC) , traffic is highly variable due to the nature of the business. On days where artists are onsale, or announce new events, the traffic spikes are contributing to somewhat a 50-70% more unique page renders compared to a non busy day. The platform is also subject and target of unforeseen (malicious?) traffic, scraping the whole range of public APIs or attacking certain areas. 

The infrastructure powering the above site should provide:
  • elasticity - shrink and grow based on demand - also offer the ability to do that based on manual intervention, on cases where we do know before hand when we are going to have surges.
  • stability - always available always serve pages and API responses
  • Toleration on failures, usually having in mind potential outages on different AWS a.z or whole regions.
  • Cost effectiveness, reduce the operation cost over time (AWS usage cost)
  • Secure
  • Fairly open to development teams. Deploying and understanding kubernetes is a developer team concern and not an exotic operation, for a separate team.

 

Kubernetes

Kubernetes was already for 2+ years our target deployment platform. The only thing that changed over time is the different tools used to spin new clusters. We already had operational experience and faced several challenges with different versions and capabilities of kubernetes through out the time. Despite the challenges, adopting kubernetes is considered a success. We never faced complete outages, the clusters and the concepts implemented never deviated from what is stated on the manual (we did gain elasticity, stability, control over the deployment process and last but not least - adopting kubernetes accelerated the path to production and delivery of business value.

Never before developers had such a close relationship with the infrastructure, in our case. This relationship developed over time and was contributing to increased awareness between 2 split concerns, the side that writes software and  the side operating and running the code in production. The biggest win was mostly the process of empowering developers of being more infrastructure aware - which slowly leads to potentially improvements on the way software is developed. Obviously the same concepts apply to any team and any cloud centric initiative. Abstracting  infrastructures concerns lowers the barrier of morphing a traditional developer which was completely disconnected from the operations to this world to. After that, sky is the limit in terms of digging deeper to the details and obviously understanding more about the infrastructure. This process requires time and   people that are willing to shift their mindset.

EKS why?
  
The first obvious answer is because AWS. If AWS is your main cloud, then you continuously try to leverage as much as possible the features of your cloud, unless you are on a different path (for example you want cloud autonomy hedging by mixing different solutions or you think you can develop everything on you own, if you can afford it). The integration of EKS with the AWS world has matured enough where you can enjoy running a fairly vanilla setup of Kubernetes (not bastardised) and behind the scenes take advantage of the integration glue offered by AWS/ESK to the rest of the AWS ecosystem. 

The second answer is cluster upgrades and security patches. Before EKS we had to engage with the specifics of the different tools (installers) when new versions came along. In many cases especially if your cloud setup has custom configuration trying to fit clusters on environments with custom networking or special VPC semantics was getting more and more challenging. Despite engaging on cluster updates in the past, the risk involved was getting bigger and bigger and we soon faced the usual dilemma many people and companies are facing  (many don't want to admit) - if you want upgrade an existing cluster just ditch it and create a new one. While being a solution, that involved a lot of extra work from our side, re-establishing our platform on top of new clusters. Obviously there is more work for us to many the platform migration more automated.

The third answer is the update policies of EKS. If you want to play by the rules of EKS, you will get your masters auto upgraded on minor revisions and you will be gently pushed to engage on upgrading your clusters to major versions. Despite still having the option to sit back and do nothing , this model encourages and accelerates the development of automation to be in place for cluster updates. Its a matter of confidence as well - the more often you upgrade and control the upgrade process the more confident you become.

 

The team

2 people. The most important thing on this setup is not the size of the team (2) but the mix of skills. Since we want to be as close as possible to the actual needs of the developers ultimately serve the business, we realised that changes like that can not happen in a skill vacuum. You can not configure and spin infrastructure thinking only as a developer but the same time you can not build the infrastructure where developers will evolve and create a platform having in mind only  the operational side of things. You need to have both, when developers are not educated enough on things like infrastructure security or performance or thorough monitoring Ops skills and expertise will provide all of the above and educate at the same time so next time they improve.

On the other side, when the infrastructure is not easily consumed by developers, not accessible or  there is an invisible barrier that disconnects the software maker from its system in production - this is where a developers point of view can help on finding the middle ground. Iteration and progressive changes is an area where software developers often do better compared to other functions.

This is one of the most taboo things in the market currently where both sides fight for control and influence. I am not sure what is the correct definition of DevOps but in my mind this journey was a DevOps journey and I wish I will be able to experience it in other places as well through out my career. Combine skills within the team and encourage the flow of knowledge instead of introducing organization barriers or bulkheads.

 

Side concern - EKS worker networking

Since this was our first time adopting EKS, we decided that the safest and more flexible approach was to fully adopt the AWS CNI networking model. This was a great change compared to our previous clusters that were heavy on overlay networking. Pods now are much easier to troubleshoot and identify networking problems - since they have routable IPs. See here. Following the vanilla approach will raise concerns about VPC CDIR sizes, we opted for a clean solution isolating our clusters from shared VPCs and starting fresh and clean, new VPCs with a fairly big range.

In cases where secondary -hot- IPs are starting to run out, or you are limited y the capabilities of your workers (Num of ENI) See here. Also nice read here.

Tools

Our main goal was not to disrupt the workflows and semantics of the existing development teams, and make our EKS clusters look kind of the same as our existing clusters. This does not mean that th our existing setup was perfect or we did not want to modernise. Again the no1 priority was the clusters should be serving the needs of the teams deploying services on top of them and not our urge to try new technologies all the time. Obviously lots of stuff will be new and different but config changes and change of tooling should be introduced iteratively. The basic flow was the following:
  1. Create the clusters and establish the clusters
  2. Introduce more or less the same semantics and configs - make it easy for teams to move their payloads (apps)
  3. Stabilize 
  4. Gradually educate and start introducing more changes on top of the clusters, either these are like new policies, new ways of deployments or new rules enforced. First priority is developer productivity with a fine balanced on good practises and obviously keeping things simple.
In order to setup / upgrade and configure the clusters we came up with a solution that uses the following tools
The workflow is the following:
  • Use Packer if you want to bake a new worker AMI (if needed or else skip)
  • Plan and Apply the terraform stack that controls the state of masters and the workers auto-scaling groups, IAM and other specifics so that the cluster is formed. We have our own terraform module even though now the reference EKS model found here is pretty solid.
  • Start invoking kubectl or helm after the cluster is formed to install some basic services.

 

Installing services on top of the cluster

Once the cluster is up AWS wise, meaning the masters can talk to various worker nodes, we deploy and configure the following components on top.
  1. Install helm (Tiller)
  2. Configuring aws-auth based on our RBAC / AWS roles to enable access to users - kubectl patch
  3. Install metrics-server (modifed helm chart)
  4. Install the aws cluster-autoscaler (helm chart)
  5. Install kubernetes-dashboard (helm chart)
  6. Install prometheus  / kube-state-metrics (helm chart)
  7. Install fluentd-bit deamons set (preconfigured to ship logs to E.S) (helm chart)
  8. Install or modify correct versions for kube-proxy see here
  9. Install or modify correct versions for aws-cni see here
  10. Install of modify correct version for CoreDNS +scale up coreDNS
  11. Scale up coreDNS
  12. Create or update namespaces
  13. Install - ambassador -proxy on certain cases - hybrid Ingress.
  14. Populate the cluster and specific namespaces with secrets - already stored on Vault
Overall the whole orchestration is controlled by Terraform. Structure changes to the cluster e.g worker nodes size, scaling semantics etc are updated on the terraform level. Some of the helm charts indicated above are dynamically templated by terraform during provisioning - so the helm charts being applied- already are in sync and have the correct values. The idea is that terraform vars can be passed as variables to individual kubectl or helm invocations - the power and simplicity of local_exec and the bash provisioner see here.

 

Auto-scaling groups and worker segmentation

Back the actual cluster setup and a very important point the auto-scaling groups, spinning the workers of the clusters. There are several patterns and techniques and by googling relevant material on the internet you will find different approaches or advices.

We opted for a simple setup where our workers will be devided into 2 distinct groups (autoscaling groups/ launch templates).

  • system - workers : We will be installing kube-system material on these workers which will be always of lifecycle type: OnDemand or Reserve instances. Payloads like prometheus, cluster autoscaler, the coredns pods or sometimes the Ambassador Proxy (if we choose too).
  • normal - workers: Will be hosting our application pods on the various namespaces. This is the asg that is expected to grow faster in terms of numbers.

The above setup on terraform - has to be reflected and mapped to one kubernetes we have defined above - the aws cluster autoscaler.

  - --namespace=kube-system
  - --skip-nodes-with-local-storage=false
  - --skip-nodes-with-system-pods=true
  - --expander=most-pods
  - --nodes={{.Values.kubesystemMinSize}}:{{.Values.kubesystemMaxSize}}:{{.Values.kubesystemAsgName}}
  - --nodes={{.Values.defaultMinSize}}:{{.Values.defaultMaxSize}}:{{.Values.defaultAsgName}}

The above setup - requires a minimal convention our application helm charts. Introduce 2 node affinity or  node selectors rules. Currently the easier way is through nodeSelector even though they will be deprecated.



Spot instances (bring that cost down!)

By being able to decouple the Kubernetes side of things (through the cluster autoscaler configs) and the AWS side, especially since we are using terraform - we now had the flexibility to experiment with Spot instances. Our main goal was to make the use of spot instances transparent as much as possible  to the people deploying apps on the cluster, and make it more of a concern for cluster operators. Obviously, there is still a wide concern /change that all involved parties should be aware. Increasing the volatility of the cluster workers, meaning by running payloads on workers that may die within a 2 minute notice, introduces challenges that is good that people writing services on these clusters should be aware of.

Spot instances can be added in the mix using a setup of 2 auto-scaling groups, assuming you use the correct launch template and mixed instance policies. Many people  decide to group their workers in in more than 2ASGs , for example instead of 2 you could have 5 or 10, where you can have more granular control of the EC2/classes utilized and their life cycle. Also you could target parts of your pods / apps to specific groups of workers based on their capabilities or lifecycle.

In general the more fine grained control you want and the more you want to hedge the risk of Spot termination the more you will lean towards the following strategies or choices.
  • Segment your workers into different capability groups (spot/OnDemand/Reserved single or multiple classes/mixed instance policies
  • Increase the average number of pods on each replica set- so that you hedge the risk of pods of the same replica set (deployment) land on the same type of workers that potentially can be killed at the same time.
  • More stateless less stateful. In way your platform can be able to recover of sustain suffer micro or minor outages of Compute/Memory. The more rely on singleton services or centralized resources the more you are going to hedge random outages.
Spot instances mean reduced prices but also termination notification. When thinking about termination the current pattern you need to consider 3 factors
  1. AWS Region (eu-west-1)
  2. AWS availability (eu-west-1a,eu-west-1b.)
  3. Class (m4.xlarge)
The above triplet is usually the major factor that will affect the spot price of class in general. The current strategy is that your payloads (pods/containers) need to obviously  spread as effectively as possible
  • Region : Thus more than one cluster
  • AZ: Your ASG should spin workers on ALL the available zones that the region offers.
  • Class: if you ASG is single class - your chances this class to be subject of random spot termination and affecting your clusters is higher than using a bigger list of classes.
The general idea is to hedge your risk of spot instance termination by running your workloads - multi region/ multi asg / multi class. There is still some risk involved - e.g AWS massively retiring at the same time - spot resources - or rapidly changing the prices. 

This is a very tricky area and settings on the ASG can help you hedge a bit more on this - for example if you have hard rules on your price limits the ASG can respect that, for example rules like 'don't bid  beyond this price for a single spot resource' . The more you make the ASG / launch template strict controlling your cost estimate - the bigger the risk to suffer outages because of this hard limit  and a  sudden change on the price. 

The most flexible approach is to let the ASG pick the `lowest-price` for you so you can be sure that it will do its best to find the next available price combination to feed your cluster with compute and memory.

In terms of spreading your pods around to different workers I think the simplest advice is not put all your eggs on a single basket. Pod Affinity/AntiAffinity rules is your no1 tool in these cases + labels on your nodes. You can find a very nice article here.

Last but not least. When termination of spot instances do happen, it is more than important to be able to react on the cluster level, so that these worker terminations dont make the cluster go crazy. The more concurrent terminations happen the bigger the risk you will see big waves of pod movement among workers and az. Kubernetes will try to balance and stuff pods into the remaining resources and obviously spin new resources, but it really depends much can can tolerate these movements and also to control how the re-scheduling of pods happen. In this area another useful tool available for you, are the kubernetes pod disruption budgets where can act as an extra set of rules the kubernetes masters - will take into account when its resource availability is in flux (meaning workers are coming and going).

On top of that in order to gracefully handle these terminations - that actually happen with a 2 minute notice , daemonsets like this (spot termination handler) will easy the pain + offer more visibility . The daemon once the spot instance receives the termination event, will gracefully drain your node, which in turn mark the worker as not ready to receive and schedule workloads, which in turn will kick a scheduling round where kubernetes will try to place the pods on other workers if there is enough space or kill new workers. Eventually the system will try to balance and satisfy your setup configs and demands - but it really depends on the amount of concurrent terminations you will have and how your pods are spread around these workers. 

The bigger the spread the less the impact. Also you can also consider a mixed policy where certain workers are always on demand and the rest are spot - so that you can hedge even more, more intense spot instance termination events.

 

Cluster upgrade concerns and worfklow

Cluster updates require some work in terms of coordination + establishing a process. There are 3 cases:
  • No EKS or kubernetes versions updates  - only modifications on the components installed on top of the clusters, for example you want to update fluentd-bit to a newer version.
  • Minor EKS update (auto mode) that needs  an EKS AMI update, bringing your workers in the same version state.
  • Major EKS update (kubernetes upgrade for example from 1.12 to 1.13) - that will require an AMI update + some aws EKS components updated.
The third case is the most challenging one, because not only you need to bake a new AMI  based on the reference provider by AWS, you also need to follow the conventions and versions of components as defined here: 
  • core-dns
  • kube-proxy
  • AWS CNI plugin update.
This means that prior to engaging on updates you need to update your config scripts, in our case the terraform variables, so that when the new AMI makes it to production and we have the core of the cluster setup, to be able to update or re-install certain components. Always follow this guide.The documentation by AWS is pretty solid.

 

AWS API throttling and EKS

The AWS masters are a black box for you as an end user, but is highly recommended that by default you have their CloudWatch logs enabled. This is was huge improvement for us, compared to our previous clusters. Master logs are isolated and easily searchable so we avoid the noise of filtering or searching big amount of logs. Also, check this very nice utility that is usually referenced in many support cases the EKS logs collector.

The masters as every other component of EKS do leverage the AWS API to make things happen. This applies for everything that runs on AWS. What you need to be aware is that if you are operating on busy centralized AWS accounts, there is always a quota on the API calls issued from different components (EC2/etc). Your EKS masters are chatty as well and the API calls issued by them will be counted and billed as the rest of the calls on your account (they are not free and they contribute to the quota). This means that when and if AWS API throttling happens on your accounts - your EKS clusters can be affected as well, so make sure you have appropriate monitoring in place to check when this happens. If throttling happens for large amounts of times - the bigger the risk internal components of EKS fail to sync or talk to each other - which means that overall the cluster may start to report random errors that sometimes can not be correlated. This is a tricky one and I really hope AWS changes the policy on this for the EKS masters and shields them from API throttling that may happen on the account. The other solution is to `box` your clusters into specific accounts and not put all your stuff on a single account with a single API quota.

AWS recently introduced this service - I guess is going to help a lot on visibility and proactively avoiding problems

 

Overall

Migrating and using EKS in production can be considered hugely a success. Obviously our platform is still in flux and changes occur and will happen through out the time. The same applies for EKS as a product, over time you see changes and updates from AWS, a very positive sign since you can see that AWS is  invested on this product and with every major kubernetes update, EKS evolves as well. Another positive thing is the quality of support from AWS, there are several times where we had to double check cases with AWS support stuff and I have to admit the resolution and answers provided were very thorough.

As I have said in the past, I think that at some point AWS will decide to complete the integration journery for its users and provide a turn key solution where configuration of the cluster will be automated end to end (masters, workers, plugins and setup). Lets see.

 

 


Saturday, March 09, 2019

Oh-my-bash MacOSX iTerm and stuff

I decided to up my command line game, so this a 'remember how to do stuff' post for me. I do acknowledge that there are tons of different ways on doing things, especially when you have to deal with the command line. So don't shoot the pianist.

Step 0: I do use brew
Use brew to manage a lot of command line tools + gui apps. You can find it here. I also occasionally use, CakeBrew (just to check on deps)

Step 1 : Upgrade my bash on macOSX
Mostly I do follow the instructions as posted here.

# Add the new shell to the list of allowed shells
sudo bash -c 'echo /usr/local/bin/bash >> /etc/shells'
# Change to the new shell
chsh -s /usr/local/bin/bash 


Step 3 : Install Oh-my-bash
I do like Oh-my-bash (Set of extensions and plug-ins. You can install it if you do the following :

bash -c "$(curl -fsSL https://raw.githubusercontent.com/ohmybash/oh-my-bash/master/tools/install.sh)"

Step 4: Activate oh-my-bash on your bash_profile
Since bash_profile is picked first you should make sure that ./bashrc is also sourced. I added this line at the end of my ~/bash_profile

source ~/.bashrc

Step 5: Activate some handy plugins using oh-my-bash
Edit ~/.bash_rc and find the section with the extensions. Currently I have the following activated:

completions=(
  awscli
  git
  composer
  ssh
)

 
plugins=(
  aws
  git
  bashmarks
)


Friday, December 07, 2018

Testing and using AWS EKS #kubernetes - findings





Context

I have been working in a team where we use kubernetes in production (not the nginx example- the real shit) for 2 years now. I have configured and used Kubernetes clusters from version 1.4.x with tools like kube-aws to 1.6-1.7 configured with kops. Amazon's EKS is the third breed of kubernetes provisioning solutions that I have the chance to try and this post is about my recent experiences for a week, trying to bring a production level EKS into life and check if it would cut it for our production needs.

This post would not have been possible without the contribution and hard work of my colleague JV - thanks!!! 

EKS basics


For those who seek an executive summary of EKS. Its an AWS managed Service (like for example your Amazon Elastic Cache). Amazon provisions, updates and patches the brains of your cluster, aka the control planes + etcd. There is kind of a flat rate (price) for the masters + EC2 standard billing for your worker fleet. AWS also provides a  custom networking layer, eliminating the need to use any additional overlay network solutions like you would do if you create the cluster on your own. You are responsible for provisioning and attaching - the worker nodes. AWS provides templates (Cloud-formation) with pre-configured workers. You are responsible for installing on top of the cluster all the other services or applications that are needed by your platform e.g how to collect logs, how to scrape metrics, other specific daemons etc. Also make note that once the cluster is up, there is nothing AWS specific, you get a vanilla experience (exception is the networking plugin).

How do I start?

There are a couple of options for spinning an EKS cluster
  1. The infamous click click on the dashboard (its good if you want to play but not production ready, meaning if you want to re-provision and test)
  2. Go through the official guide of EKS installation using command like tools like aws eks etc. Its a good option especially if you love the aws command line tooling.
  3. Use third party command line tools that offer behind the scenes extra functionality, namely things like eksctl . It's a very promising tool by the way.
  4. Terraform all the things!
    1. Followed the official guide here.
    2. Or use samples like this interesting module see here
Despite being slightly unrelated with the above 4 points, dont forget to bookmark and read the eksworkshop. One of the best written getting started guides I have seen lately - many thanks to

We started the PoC with option 4.1 . So we used the official terraform guide (thank you Hashicorp) and then the worker provision was terraformed as well. So we did not keep the standard cloudformation extract from AWS. As you can understand, the tool of choice sometimes is dictated by the available levels of skills and experience within the team. In general we love terraform (especially us the developers) .

Other things to consider before I start?


So, as we discovered and of course it was very well documented, an EKS cluster due to the networking features that it brings (more on this later), really shines when it occupies its own VPC! Its not that you can not spin an EKS cluster on your existing VPCs  but make sure you have enough free IPs and ranges available since by default the cluster - and specifically the workers, will start eating your IPs. No this is not a bug, its a feature and it actually makes really sense. It is one of the things that I really loved with EKS. 

 
First milestone - spin the masters and attach workers

The first and most important step is to spin your masters and then provision your workers. Once the workers are being accepted and join the cluster you more or less have the core ready. Spinning just masters (like many articles out there feature is like 50% of the work). Once you can create an auto-scaling group where your workers will be created and then added to the cluster - this is like very close to the real thing.

Coming back to the Pod Networking feature

If you have ever provisioned a kubernetes clusters on AWS, using tools like kops or kube-aws, then you most probably have already installed or even configured the overlay network plugin that will provide pod networking in your clusters. As you know, pods have IPs, overlay networks on a kubenretes cluster, provide this abstraction see (calico, flannel etc). On an EKS cluster, by default you don't get this overlay layer. Amazon has actually managed to bridge the pod networking world (kubernetes networking) with its native AWS networking. In plain words, your pods (apps) within a cluster do get a real VPC IP. When I heard about this almost a year ago I have to admit I was not very sure at all, after some challenges and failures, I started to appreciate simplicity on the networking layer for any kubernetes cluster on top of AWS. In other words if you manage to remove one layer of abstraction, since your cloud can natively take over this, why keep having one extra layer  of networking and hops where you can have the real thing? 

But the workers pre-allocate so many IPs

In order EKS optimize Pod placement on the worker, uses the underlying EC2 worker capabilities to reserve IPs on its ENIs. So when you spin a worker even if you there are no pods or daemons allocated to them, you can see on the dashboard that they will have already pre-allocate a pool of 10 or depending on the class size, number of IPs. If you happen to operate your cluster on a VPC with other 'residents' your EKS cluster can be considered a threat! One way to keep the benefits of AWS CNI networking but make some room on VPCs that are running out of free IPs is to configure- after bringing up the masters - the 'aws-node' deamon set. This is an AWS specific deamon part of EKS magic that make all this happen. See here for a similar issue. So just

kubectl edit deamonset aws-node -n kube-system

and add the `WARM_IP_TARGET` to something smaller.

Make note as we discovered, setting the WARM IP TARGET to something smaller, does not limit the capacity of your worker to host more pods. If your worker does not have WARM IPs to offer to newly created and allocated pods will request a new one from the networking pool. 

In case that that even this work around is not enough - then there is always the options to switch on calico on top of the cluster. See here. Personally after seeing CNI in action I would prefer to stick to this. After 2 years with cases of networking errors, I think I can trust better AWS networking. There is also the maintenance and trouble shooting side of things. Overlay networking is not rocket science but at the same time is not something that you want to be spending time and energy trouble shooting especially if you are full with people with these skills! Also the more complex your AWS networking setup is, the harder it becomes to find issues when packets jump from the kubernetes world to your AWS layer and vice versa. It is always up to the team and people making the decisions to choose the support model that they think fits to their team or assess the capacity of the team to provide real support on challenging occasions. 

What else did you like? - the aws-iam-authenticator

Apart from appreciating the simplicity of CNI I really found very straight forward the integration of EKS with the existing IAM infrastructure. You can use your corporate (even SAML) based roles / users of your AWS account to give or restrict access to your EKS cluster(s). This is a BIG pain point for many companies out there and especially if you are an AWS shop. EKS as just another AWS managed service, follows the same principles and provides a bridge between IAM and kubernetes RBAC!. For people doing kubernetes on AWS, already know that in the early days, access to the cluster and distribution of kube configs - was and still is a very manual and tricky job since the AWS users and roles mean nothing to the kubernetes master(s). Heptio has done a very good job with this.

What is actually happening is that you install the aws-iam-authenticator and attach it to your kubectl , through ./kube/config. Every time you issue a command on kubectl, it is being proxied by the aws-iam-authenticator which reads your AWS credentials (./aws/credentials) and maps them to kubernetes RBAC rules. So you can map AWS IAM roles or Users to Kubernetes RBAC roles or create your own RBAC rules and map them. It was the first time I used this tool and actually works extremely well! Of course if you run an old kubernetes cluster with no RBAC it wont be useful but in the EKS case, RBAC is by default enabled! In your ./kube/config the entry will look like this.

- name: arn:aws:eks:eu-west-1:
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - token
      - -i
      -
      env:
      - name: AWS_PROFILE
        value:
      command: aws-iam-authenticator


Make note that, from the EKS admin side you will need to map the on your cluster 

kubectl edit configmap aws-auth -n kube-system

- rolearn:
       username: some-other-role
       groups:
         - system:masters #to SOME KUBERNETES RBAC ROLE


What about all the other things that you have to install?

Once the cluster is ready so you have masters and workers running then the next steps are the following, and can be done by any admin user with appropriate `kubectl` rights.


  • Install and configure Helm
  • Install and configure the aws-cluster-autoscaler. Which is more or less straight forward, see here and here for references.
  • Install and configure fluentD to push logs e.g to Elastic Search
  • Install and configure Prometheus.
  • And of course..all the things that you need or have as dependencies on your platform.

Should I use EKS?


  • If you are an AWS user and you have no plans on moving away, I think is the way to go!
  • If you are a company/ team that wants to focus on business delivery and not spend a lot of energy keeping different kubernetes clusters alive, then YES by all means. EKS reduces your maintenance nightmares and challenges 60-70% based on my experience.
  • If you want to get patches and upgrades (on your masters) for free and transparently - see the latest kubernetes security exploit and ask your friends around, how many they were pushed to ditch old clusters and start over this week (it was fun in the early days but it is not fun any more). So I am dreaming of easily patched clusters and auto upgrades as a user and not cases like - lets evacuate the cluster we will build a new one! 
  • Is it locking you on a specific flavour? No the end result is a vanilla kubenetes, and even that you might be leveraging the custom networking, this is more less the case when you use a similar more advanced offering from Google (which is a more complete ready made offering).
  • If you have second thoughts about region availability, then you should wait until Amazon offers EKS on a broad range of regions, I think this is the only limiting factor now for many potential users.
  • If you already have a big organization tightly coupled with AWS and the IAM system - EKS is the a perfect fit in terms of securing and making your clusters available to the development teams!
Overall it was a very challenging and at the same time interesting week. Trying to bring up an EKS cluster kind of pushed me to read and investigate things on the AWS ecosystem that I was ignoring in the past.