Scaling is a large topic.

If you Kubernetes environment does not scale properly you’re either wasting money by having too many nodes, or you’re wasting time fixing issues because deployments are stalled and there are not enough nodes. The cluster autoscaler takes care of this by hooking into your AWS Autoscaling Groups and managing the scale up or scale down events.

For effective scaling - we need to know what to expect, that is why it is always a good idea to set the resource requests for each deployment. Without this the autoscaler is making a lot of assumptions about the requirements. It is not always easy to know how much CPU or memory a pod will use so you may need to run it for a while with no requests and limits, monitor then usage, and then make an educated guess.

Scaling to EC2 Spot Instances

Stateless microservices in an EKS cluster present a great opportunity to save costs by deploying them to cheap EC2 SPOT instances. AWS SPOT instances are underutilised instance types that can be requested for around a 60% saving of the standard EC2 instance.

To make use of the available spot capacity you need to create an autoscaling group that lists the types of CPU/RAM combinations you need. If you’re cluster is spread across multiple Availability Zones, it is important to create an ASG per AZ - this is so that the cluster autoscaler can scale and balance the load correctly.

For example, the eksctl configuration below will create an ASG from 2 VCPU and 8GiB instance types. This ASG can also scale from 0 which means when there is not enough load all instances will be released.


managedNodeGroups:
  - name: example-stateless-2vcpu-8gb-1c
    spot: true
    tags:
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
      k8s.io/cluster-autoscaler/node-template/label/instance-type: spot
      k8s.io/cluster-autoscaler/node-template/taint/spotInstance: "true:PreferNoSchedule"
    labels:
      nodegroup-type: stateless
      instance-type: spot
    taints:
      - key: spotInstance
        value: "true"
        effect: PreferNoSchedule
    instanceTypes:
      - t3a.large
      - t3.large
      - m5.large
      - m5a.large
      - m5d.large
      - m5ad.large
      - m5n.large
      - m5dn.large
    desiredCapacity: 0
    minSize: 0
    maxSize: 5
    availabilityZones: ["eu-west-1c"]
    iam:
      withAddonPolicies:
        certManager: true
        autoScaler: true
        externalDNS: true
    ssh: # use existing EC2 key
      publicKeyName: my-ssh-key
The tags are critical for cluster autoscaler to scale from 0

After creating the ASG you may need to manually add the tags to the ASG in the AWS console. In my experience eksctl does not add the tags to the ASG which means that cluster-autoscaler does not pick them up. It could be an IAM permission settings but it is easy enough to add them. Open your AWS console

  • Navigate to EC2
  • Select Auto Scaling Groups from the left hand side
  • Select the ASG you created for stateless applications
  • Scroll down to the bottom and edit the Tags
  • Add the following tags
    • k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
    • k8s.io/cluster-autoscaler/node-template/label/instance-type: spot
    • k8s.io/cluster-autoscaler/node-template/taint/spotInstance: true:PreferNoSchedule

This must be done if you plan on scaling from 0 as per the setup above. The tags are picked up by cluster autoscaler and matched with the deployment nodeAffinity which then triggers a scale up of the ASG.

Once the ASG has been configured you can add the matching node affinity settings and tolerations to your deployments. The setup below will tell the Kubernetes scheduler that your deployment can tolerate the taint and would prefer to be scheduled on a spot instance.

tolerations:
  - key: "spotInstance"
    operator: "Equal"
    value: "true"
    effect: "PreferNoSchedule"

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
            - key: eks.amazonaws.com/capacityType
              operator: In
              values:
                - SPOT
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: nodegroup-type
              operator: In
              values:
                - stateless

Planning for interruptions

Spot instances can be reclaimed by AWS at anytime

When selecting workloads for a stateless deployment it is important to plan for disruptions. The best type of workload would meet these criteria

  • maintains state in a DB
  • does not have persistent storage
  • can be interrupted

Some workloads should not have more than 1 replica active at any time. For these you should be sure to use the Recreate strategy in your deployment. This will stop the current deployment and create a new instance only when the old instance has terminated. This is not the default for Kubernetes so it needs to be added to any deployment that needs to operate as the only resource running.

spec:
  strategy:
    type: Recreate

For workloads that are stateless but need to be always available, for example an API, you can make use of the RollingUpdate strategy (the default) which updates pods 1 by 1 and ensures no cluster down time. During a node scale down the new pod will be started while the old pod is terminating.

spec:
  strategy:
    type: RollingUpdate

To really avoid downtime use the PodDisruptionBudget API object.


apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: myapp

In the above example, if your deployment has 3 replicas on SPOT instances that are all being reclaimed, the scheduler will terminate 2 pods and then wait for at least 1 new pod to start before terminating the 3rd replica.