Living in the year 2020 we cannot deny that the demands and needs have changed in the development sector compared to past times (perhaps decades). Today there are projects, platforms, solutions, applications that often have the need to grow exponentially without advanced notice. Assuming that we work on a platform with agile methodologies and current architectures such as micro-services, this type of need leads us to think about “being prepared” to support them.
Being prepared in my case and as a result of professional experiences, from the perspective of infrastructure and emphasizing the concept of availability, can mean a large part of this concept.
What do you mean by availability? You should ask yourself …
When I talk about availability, I mean the ability of a platform to provide service without interruptions, in a homogeneous way and distributed by several geographical regions if necessary. I can imagine the first thing that comes to mind…costs! Although it is expensive to carry out a platform that considers the concept of availability as an unavoidable characteristic, it is an aspect that we can manage by using resources wisely and adopting recommended strategies. We could also criticize: availability without scalability makes no sense, and so on …
What does scalability mean?
Let’s imagine that we have under our tutelage a worldwide electronic commerce platform (nice, isn’t it?) And what is frequently known as “Black-Friday” happens – it is a widely used and quite representative example – .”Black-Friday” in particular is a date in which we expect a significant increase in the use of our platform, whether it means serving the web, solving searches on the site or ensuring the reliability of our transactions; our services have to be prepared for the increase of requests against us infrastructure. So, scalability can be understood as the ability of our solution to understand said increase in traffic and grow, at the infrastructure level, to offer a more robust platform in order to guarantee availability. After all, it is understood that this increase of transactions will take place at different times in each region, which would be an unnecessary expense if we did this manually trying to anticipate the increase in requests. So far everything is very nice, but when the transactions decrease, is there enough elasticity on the platform to return to a “minimum” scheme?
What is elasticity?
Of course, we are referring to elasticity in terms of infrastructure. Elasticity, we could understand it as the ability of our platform to return to the initial state in a way that the circumstances warrant. We could misrepresent the physical definition of elasticity, which fortunately is very accurate:
The term elasticity designates the mechanical property of certain materials to undergo reversible deformations when they are subjected to the action of external forces and to recover their original shape if these external forces are eliminated.
By now you must be wondering, EKS? What is that?
EKS (Elastic Kubernetes Service) is the solution that AWS offers to our clustering challenges. Kubernetes is a well-known container orchestrator. The general idea of Kubernetes is to give some “simplicity” to the problem of dealing with different services (such as backend services) that must coexist in the same context. While arrangements may need to be made from a development standpoint in order to bring a platform from a managed schema to these types of solutions, I believe this task is invaluable added value.
And where does EKS come into all this?
EKS is designed to be able to clearly and accurately carry out the aforementioned requirements:
Regarding availability, the service that Amazon offers, facilitates the configuration of our cluster between different availability zones in a simple and clear way. Detecting if any of our resources is in an area with some type of service restriction such as errors, updates and/or patches and automatically and invisibly for us recreates it in an area that does not have altercations.
In an imaginary scenario, we could think about whether we will work with environments fully managed by us, either on-premise or in cloud instances. If these environments, not necessarily all, needed a patch or update due to a framework change which would force us to restart our services, this would mean a service drop of a few minutes. We would be entering a framework that can facilitate errors, either due to the time that this service or these services are stopped, such as services that can be intrinsically related, such as a queue or asynchronous events waiting for a response from this server which is entering a maintenance window. Here comes another great factor, the “maintenance window” which we must notify the customer and perhaps users, putting authorizations as a bottleneck for highly necessary tasks.
These types of considerations are resolved by the concept of availability that EKS and Kubernetes handle separately. If we had our platform using containers to run our services, changing the active version and the one that is falling would be managed by the Kubernetes service, depending on our preference when defining the services through the configuration in our manifests from Kubernetes. This allows us to switch between a scheme that raises a new service in parallel, waits for the health checks to be successful and then replaces the current one. A service expiration scheme, where it waits for the container to finish processing the information it is handling, stops the service, and creates a new one. And finally a more binary scheme where it destroys the existing container and recreates a new one from scratch. In turn, if it were a Kubernetes version level update, EKS would take care of all the “cabling” between one version and another, being completely transparent to us. We would subscribe nodes with the new version to our Cluster and start draining the workers with the version that we no longer need. The added value here, in addition, is the fact that EKS takes care of creating the instances that we want, implementing the subscription to the master node and installing everything necessary to be an agent of the Kubernetes cluster.
From the scalability point of view we take advantage of this same EKS functionality. The simplicity with which it handles the subscription of new nodes to the cluster makes it easier for us to scale our nodes horizontally and in turn be able to create even more robust nodes to subscribe to our cluster. Once again, we could propose a theoretical scenario where the need to provide a greater computing capacity to our application resides unexpectedly. Either this application is based on a micro-services architecture (which would be ideal in this infrastructure context) or it is more monolith-oriented applications. We could almost immediately add nodes with higher compute capacity to the cluster and move these services without losing service.
To this scenario is added the problem of elasticity. Although the concept of elasticity may be covered by Scalability, in this context we propose the aforementioned difference and this difference lies in the automation of these tasks.
The need to grow or shrink will be resolved both from the Kubernetes side and from the EKS side. In the first case, we could add to the Kubernetes administration layer (the namespace known as kubeadm) a service provided by as an add-on by Kubernetes, which is in charge of actively verifying, as a service running within the cluster , the state of the containers, either due to a computing need or the fact of not being able to create more containers in said instance and once this is verified, makes the decision to scale both in an increasing and decreasing way, creating instances of the type that we indicate in the service policies. On the other hand, EKS solves this in another way. Using EKS means living in the context of Amazon and thus one can configure Cloudwatch – a monitoring service provided by Amazon by default – to verify certain metrics of the nodes that are part of the Cluster and thus be able to configure policies to make scalability decisions based on those metrics.
So, if our application was affected by a lack of performance and found itself in resource deficiency, these services will be immediately in charge of making the decision to scale up and down.
Due to the existence of functionalities that EKS and Kubernetes provide us, a large part of the “Be prepared” is covered in the year 2020.