Kubernetes: Horizontal Pod Autoscaling (HPA)

Ajay Yadav
4 min readJan 22, 2022


Let’s assume you have a website for listing out the best places to visit for any city.

You choose K8s as a solution to host your application.

Manual Scaling

Whenever you see the spike in the QPS for your application you have two options: Horizontal scaling or Vertical scaling.

Horizontal scaling of the pods can be done by increasing the number of replicas in the scalable resource. E.g. Replica set, Replication controller, or deployment.

Also, pods can be vertically scaled by increasing the requests and limit size. Though the max limit is limited by the size of the node.

After a few days, you feel you are wasting extra energy to monitor the resource utilization and reacting based on that to update the number of replicas in the pods. You thought if that can be automated then you can focus on other business operations.

Horizontal Pod Auto-scaler (HPA)

HPA is a component of the Kubernetes that can automatically scale the numbers of pods. The K8s controller that is responsible for auto-scaling is known as Horizontal Controller.

Horizontal scaler scales pods as per the following process:

  • Fetch the desired metrics from the pods
  • Compute the targeted number of replicas by comparing the fetched metrics value to the targeted metric value.
  • Replica count is updated in the scalable resource eg. Deployment

Let’s understand each component of HPA one by one.

Pod Metrics

Kubelet has a component known as cAdvisor which fetches the metrics from the pods. Heapster aggregate the metrics. HPA can fetch the metrics from the Heapster via the REST APIs.

Compute targeted pod count

It accepts metrics fetched in the previous step and computes the targeted number of the pods.


Let’s assume HPA is defined on the CPU metrics only.

These pods have CPU consumption of 60%, 90%, and 50%. The targeted CPU is set to 50%.

The number of the targeted pods are computed as follow

Number of pods = (60 + 90 + 50) / 50 = 4

Hence targeted number of the pods is 4.


HPA can be defined on multiple metrics. Let’s say we define an HPA on CPU consumption and Query per second (QPS)

Computation of targeted pods even based on multiple metrics is also simple. First, we have to compete the target pods based on each metric independently and then have to select the maximum number of the pods.

Scalable Resource Update

Now we have computed the number of pods required. But to turn them into actual running pods, HPA updates the scalable resource (e.g. Deployment, Replica-set, etc) configuration.

Auto-Scaling Process

Combining all the above steps combined process is as the follows

Kubectl commands

Create HPA resources on a deployment

kubectl autoscale deployment testAppDeployment testAppHpa --cpu-percentage=30 --min=1 --max=5

Get all the HPA

kubetl get hpa -A

Describe an HPA resource

kubectl describe hpa testAppHpa 

View YAML for HPA resource

kubectl get hpa testAppHpa -o yaml



Ajay Yadav

Believer of Distributed Systems