Scaling from 0 with EKS and Spot Instances
Scaling is a large topic.
If you Kubernetes environment does not scale properly you’re either wasting money by having too many nodes, or you’re wasting time fixing issues because deployments are stalled and there are not enough nodes. The cluster autoscaler takes care of this by hooking into your AWS Autoscaling Groups and managing the scale up or scale down events.
For effective scaling - we need to know what to expect, that is why it is always a good idea to set the resource requests for each deployment. Without this the autoscaler is making a lot of assumptions about the requirements. It is not always easy to know how much CPU or memory a pod will use so you may need to run it for a while with no requests and limits, monitor then usage, and then make an educated guess.
Scaling to EC2 Spot Instances⌗
Stateless microservices in an EKS cluster present a great opportunity to save costs by deploying them to cheap EC2 SPOT instances. AWS SPOT instances are underutilised instance types that can be requested for around a 60% saving of the standard EC2 instance.
To make use of the available spot capacity you need to create an autoscaling group that lists the types of CPU/RAM combinations you need. If you’re cluster is spread across multiple Availability Zones, it is important to create an ASG per AZ - this is so that the cluster autoscaler can scale and balance the load correctly.
For example, the eksctl
configuration below will create an ASG from 2 VCPU and 8GiB instance types. This ASG can also scale from 0 which means when there is not enough load all instances will be released.
managedNodeGroups:
- name: example-stateless-2vcpu-8gb-1c
spot: true
tags:
k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
k8s.io/cluster-autoscaler/node-template/label/instance-type: spot
k8s.io/cluster-autoscaler/node-template/taint/spotInstance: "true:PreferNoSchedule"
labels:
nodegroup-type: stateless
instance-type: spot
taints:
- key: spotInstance
value: "true"
effect: PreferNoSchedule
instanceTypes:
- t3a.large
- t3.large
- m5.large
- m5a.large
- m5d.large
- m5ad.large
- m5n.large
- m5dn.large
desiredCapacity: 0
minSize: 0
maxSize: 5
availabilityZones: ["eu-west-1c"]
iam:
withAddonPolicies:
certManager: true
autoScaler: true
externalDNS: true
ssh: # use existing EC2 key
publicKeyName: my-ssh-key
tags
are critical for cluster autoscaler to scale from 0
After creating the ASG you may need to manually add the tags to the ASG in the AWS console. In my experience eksctl
does not add the tags to the ASG which means that cluster-autoscaler
does not pick them up. It could be an IAM permission settings but it is easy enough to add them.
Open your AWS console
- Navigate to EC2
- Select Auto Scaling Groups from the left hand side
- Select the ASG you created for stateless applications
- Scroll down to the bottom and edit the Tags
- Add the following tags
k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
k8s.io/cluster-autoscaler/node-template/label/instance-type: spot
k8s.io/cluster-autoscaler/node-template/taint/spotInstance: true:PreferNoSchedule
This must be done if you plan on scaling from 0 as per the setup above. The tags are picked up by cluster autoscaler and matched with the deployment nodeAffinity which then triggers a scale up of the ASG.
Once the ASG has been configured you can add the matching node affinity settings and tolerations to your deployments. The setup below will tell the Kubernetes scheduler that your deployment can tolerate the taint and would prefer to be scheduled on a spot instance.
tolerations:
- key: "spotInstance"
operator: "Equal"
value: "true"
effect: "PreferNoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: eks.amazonaws.com/capacityType
operator: In
values:
- SPOT
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodegroup-type
operator: In
values:
- stateless
Planning for interruptions⌗
When selecting workloads for a stateless deployment it is important to plan for disruptions. The best type of workload would meet these criteria
- maintains state in a DB
- does not have persistent storage
- can be interrupted
Some workloads should not have more than 1 replica active at any time. For these you should be sure to use the Recreate strategy in your deployment. This will stop the current deployment and create a new instance only when the old instance has terminated. This is not the default for Kubernetes so it needs to be added to any deployment that needs to operate as the only resource running.
spec:
strategy:
type: Recreate
For workloads that are stateless but need to be always available, for example an API, you can make use of the RollingUpdate strategy (the default) which updates pods 1 by 1 and ensures no cluster down time. During a node scale down the new pod will be started while the old pod is terminating.
spec:
strategy:
type: RollingUpdate
To really avoid downtime use the PodDisruptionBudget
API object.
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: myapp
In the above example, if your deployment has 3 replicas on SPOT instances that are all being reclaimed, the scheduler will terminate 2 pods and then wait for at least 1 new pod to start before terminating the 3rd replica.