Kubernetes: Horizontal Pod Autoscaling (HPA)
Let’s assume you have a website for listing out the best places to visit for any city.
You choose K8s as a solution to host your application.
Whenever you see the spike in the QPS for your application you have two options: Horizontal scaling or Vertical scaling.
Horizontal scaling of the pods can be done by increasing the number of replicas in the scalable resource. E.g. Replica set, Replication controller, or deployment.
Also, pods can be vertically scaled by increasing the requests and limit size. Though the max limit is limited by the size of the node.
After a few days, you feel you are wasting extra energy to monitor the resource utilization and reacting based on that to update the number of replicas in the pods. You thought if that can be automated then you can focus on other business operations.
Horizontal Pod Auto-scaler (HPA)
HPA is a component of the Kubernetes that can automatically scale the numbers of pods. The K8s controller that is responsible for auto-scaling is known as Horizontal Controller.
Horizontal scaler scales pods as per the following process:
- Fetch the desired metrics from the pods
- Compute the targeted number of replicas by comparing the fetched metrics value to the targeted metric value.
- Replica count is updated in the scalable resource eg. Deployment
Let’s understand each component of HPA one by one.
Kubelet has a component known as cAdvisor which fetches the metrics from the pods. Heapster aggregate the metrics. HPA can fetch the metrics from the Heapster via the REST APIs.
Compute targeted pod count
It accepts metrics fetched in the previous step and computes the targeted number of the pods.
SINGLE METRIC COMPUTATION
Let’s assume HPA is defined on the CPU metrics only.
These pods have CPU consumption of 60%, 90%, and 50%. The targeted CPU is set to 50%.
The number of the targeted pods are computed as follow
Number of pods = (60 + 90 + 50) / 50 = 4
Hence targeted number of the pods is 4.
HPA can be defined on multiple metrics. Let’s say we define an HPA on CPU consumption and Query per second (QPS)
Computation of targeted pods even based on multiple metrics is also simple. First, we have to compete the target pods based on each metric independently and then have to select the maximum number of the pods.
Scalable Resource Update
Now we have computed the number of pods required. But to turn them into actual running pods, HPA updates the scalable resource (e.g. Deployment, Replica-set, etc) configuration.
Combining all the above steps combined process is as the follows
Create HPA resources on a deployment
kubectl autoscale deployment testAppDeployment testAppHpa --cpu-percentage=30 --min=1 --max=5
Get all the HPA
kubetl get hpa -A
Describe an HPA resource
kubectl describe hpa testAppHpa
View YAML for HPA resource
kubectl get hpa testAppHpa -o yaml