The process of automatically scaling in and scaling out of resources is called Autoscaling. There are three different types of autoscalers in Kubernetes: cluster autoscalers, horizontal pod autoscalers, and vertical pod autoscalers. In this article, we're going to see Horizontal Pod Autoscaler.
Application running workload can be scaled manually by changing the replicas field in the workload manifest file. Although manual scaling is okay for times when you can anticipate load spikes in advance or when the load changes gradually over long periods of time, requiring manual intervention to handle sudden, unpredictable traffic increases isn’t ideal.
To solve this problem, Kubernetes has a resource called Horizontal Pod Autoscaler that can monitor pods and scale them automatically as soon as it detects an increase in CPU or memory usage (Based on a defined metric). Horizontal Pod Autoscaling is the process of automatically scaling the number of pod replicas managed by a controller based on the usage of the defined metric, which is managed by the Horizontal Pod Autoscaler Kubernetes resource to match the demand.
How does a HorizontalPodAutoscaler work?
A HorizontalPodAutoscaler (HPA) in Kubernetes is a tool that automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed CPU utilization (or other select metrics). Here's a simple breakdown of how it works:
- Monitoring Metrics: The HPA continuously monitors the resource usage (like CPU or memory) of the pods. This is typically done using the Kubernetes metrics server, which collects data from the nodes and pods.
- Defining Targets: You set a target resource utilization level. For example, you might want your pods to use an average of 50% of their allocated CPU.
- Calculating Desired Replicas: The HPA calculates the desired number of replicas based on the current resource usage and the target utilization. If the average usage is above the target, it will increase the number of replicas; if it's below, it will decrease them.
- Scaling the Pods: Based on its calculations, the HPA adjusts the number of replicas in the deployment, adding or removing pods as needed to meet the target utilization.
Setup a Minikube Cluster
These steps are necessary to use Autoscaling features. By following the below steps, we can start the cluster and deploy the application into the Minikube cluster.
Step 1: Deploy the minikube cluster.
Step 2: Start your cluster.
$ minikube start

Step 3: Enable metrics-server addon to collect metrics of resources.
$ minikube addons enable metrics-server

Step 4: Edit metrics-server deployment by adding --kubelet-insecure-tls argument.
$ kubectl -n kube-system edit deploy metrics-server
containers:
   - args:
    - --cert-dir=/tmp
    - --secure-port=8448
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --kubelet-insecure-tls
Step 5: Let's create a deployment for our demo purposes. I chose Nginx as our application with 1 replica. This deployment requests 100 millicores of CPU per pod.
apiVersion: apps/v1
kind: Deployment
metadata:
name: webserver
labels:
app: backend
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: nginx
image: nginx:1.23-alpine
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 200m
memory: 20Mi
requests:
cpu: 100m
memory: 10Mi
$ kubectl create -f nginx-deploy.yaml

Scaling Based on CPU Usage
One of the most important metrics to define autoscaling is CPU usage. Let's say the CPU usage of processes running inside your pod reaches 100% Then they can't match the demand anymore. To solve this problem, either you can increase the amount of CPU a pod can use (Vertical scale) or increase the number of pods (Horizontal scale) so that the average CPU usage comes down, Enough talking, let's create a Horizontal Pod Autoscaler resource based on CPU usage and see it in action.
Step 1: Create a Horizontal Pod Autoscaler resource for our deployment.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: webserver-cpu-hpa
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webserver
targetCPUUtilizationPercentage: 30
Let's understand what are these attributes
- maxReplicas - Maximum number of replicas to scale-out
- minReplicas - Minimum number of replicas to scale-in
- scaleTargetRef - Target resource to act upon, in our case webserver deployment
- targetCPUUtilizationPercentage - CPU utilization percentage to adjust the number of pods by Autoscaler so they each utilize 30% of the requested CPU.
Now create the resource
$ kubectl create -f nginx-deploy-cpu-hpa.yaml

Let's put some load on our deployment so that we can see scaling in action
Step 2: First of all, expose our application as NodePort service otherwise how can we load test our application
$ kubectl expose deploy webserver \
--type=NodePort --port=8080 \
--target-port=80

Step 3: Now comes the interesting part, which is load testing. For load testing, I'm using the the Horizontal
Here 250 concurrent users simulate the load for 2 minutes, you can change it accordingly.
$ siege -c 250 -t 2m http://127.0.0.1:58421
(replace http://127.0.0.1:58421 with the NodePort service address.)

Open another terminal and watch the resources and you will see an increase in the number of pods. Keep an eye on the number of pods, because as soon as HPA detects the CPU usage exceeds, it will create more pods to handle the load.
$ watch -n 1 kubectl get all po,hpa

Since the load crosses the limit, HPA increased the number of replicas from 1 to 2

now the CPU usage becomes 0 it scales down replicas to the minimum replicas (1) defined in the HPA manifest file.

Scaling Based on Memory Usage
This time we'll configure HPA based on memory usage
Step 1: Creating a Horizontal Pod Autoscaler resource based on memory usage
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webserver-mem-hpa
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webserver
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageValue: 2Mi
Step 2: Here we mention averageValue as 2Mi because nginx deployment is very lightweight, so we've got to set it so that we can see scaling based on memory.
$ kubectl create -f nginx-deploy-mem-hpa.yaml

Step 3: Again load test and watch resources in another terminal

Again, the memory usage exceeds so the HPA spins new pod replicas.
Scaling workloads manually
The Kubectl scale tool can be utilized to manually scale Kubernetes workloads by altering the number of replicas that are desired in the deployment or statefulset demands. This gives users large control over how assets are distributed based on workload demands.
List Deployments: This command displays the current replicas for each deployment in your cluster.
kubectl get deployments
Scale Deployment: Set the --replicas the parameter to the proper amount of replicas and replace my-deployment with the name of your deployment.
kubectl scale deployment my-deployment --replicas=5
Autoscaling during rolling update
A Deployment may handle its underlying ReplicaSets via performing a rolling update. A HorizontalPodAutoscaler (HPA) is attached to a deployment when autoscaling has been set up for it. With its replicas field, which it modifies based on resource use, the HPA controls the number of replicas utilized for the deployment.
During a rolling update:
- The deployment controller makes sure that the total number of pods from all old and new ReplicaSets involved matches the amount that is planned.
- The HPA keepsmakes an eye on events and adjusts the overall amount of replicas as needed.
The scenario differs substantially for StatefulSets. Without the use of a ReplicaSet or similar intermediate resource, StatefulSets directly keep their pods. When performing a rolling update on an autoscaled StatefulSet:
- Each pod is handled directly through the StatefulSet controller.
- The number of pods that a StatefulSet maintains is impacted directly by the HPA's modification of the StatefulSet's replica count.
Container resource metrics
Container resource metrics are used by Kubernetes to track and control how much resource every container in a cluster uses. These metrics aid to make sure resources are used effectively and that initiatives function effectively. CPU and memory use are significant metrics that are often used for autoscaling and performance monitoring. A summary of Kubernetes' container resource metrics is given below:
Key Metrics
- CPU Usage:
- Measured in millicores (m): 1000m = 1 CPU core.
- Usage: The amount of CPU time the the container consumes.
- Limit: The maximum amount of CPU resources that may be utilized by the container.
- Request: The smallest amount of CPU resources that the container gets is guaranteed.
- Memory Usage:
- Measured in bytes (B), kilobytes (Ki), megabytes (Mi), etc..
- Usage: The amount of RAM which the container utilizes at this point in time.
- Limit: The greatest quantity of RAM that is permitted to used by the container.
- Request: The smallest amount of memory that the container will always possess.
- Resource Requests and Limits: You may set resource requests and limitations when defining a container in a pod specification in order to make sure the container gets the resources that it needs and to prevent it from using more than it should. This improves the cluster's capacity to distribute assets and provide high-quality services.
- Collecting Metrics: The Kubernetes Metrics Server is a cluster-wide consumption of resources data aggregator which can be utilized for collecting metrics. It collects metrics from each node's Kubelet and then makes them accessible through the Kubernetes API.
- Using Metrics for Autoscaling: These metrics are employed by the HorizontalPodAutoscaler (HPA) to automatically adjust the number of pod replicas based to observed utilization of resources.
Viewing Metrics: The Kubernetes Dashboard or the kubectl top commands may be employed to view metrics.
- View Node Metrics:
kubectl top nodes
- View Pod Metrics:
kubectl top pods
Support for HorizontalPodAutoscaler in kubectl
The HorizontalPodAutoscaler (HPA) in Kubernetes handles scheduling pod scaling up automatically based on resource use metrics as CPU or memory. Below is an overview of how it operates:
- Create Autoscaler: Kubectl create can be utilized to construct a new HPA in the exact same way that any other resource. This option enables a fast and efficient autoscaler setup.
- List HPAs: Use kubectl get hpa to get all of your current HPAs. Using the assistance of this command, you are able to view the names, target resources, and scaling environments of all of your current autoscalers.
- Describe HPA: Use kubectl describe hpa to secure additional information about a specific HPA. Full details about the autoscaler, including events, metrics, and scaling history, is given by this command.
- Specialized Autoscale Command: In addition, kubectl autoscale, a specialized command created specifically for HPA development, is offered by Kubernetes. This command lets you specify scaling parameters directly, and streamlines the procedure. For example, kubectl autoscale deployment my-app --min=2 --max=5 --cpu-percent=80 creates an autoscaler with a replica count of between two and five and a target CPU utilization of 80% for the my-app deployment.
- Delete Autoscaler: Finally, you may execute kubectl delete hpa to eliminate an autoscaler following you finish using it. By deleting the assigned HPA from your cluster, this procedure guarantees efficient resource management.
Similar Reads
How to Use Kubernetes Annotations?
Annotations are key-value pairs that are used to attach non-identifying metadata to Kubernetes objects. Various tools that are built over Kubernetes use this metadata attached by annotations to perform actions or enhance resource management. Labels and Annotations are used to attach metadata to Kube
11 min read
How to Add Node to Existing Kubernetes Cluster
Kubernetes allows the management and orchestration of containerized deployments. We can use various cluster managing tools, but Kubeadm allows us to create and manage Kubernetes clusters from scratch. Kubernetes cluster can be autoscaled with the use of Cluster management services. In this article,
4 min read
How To Use Kind To Deploy Kubernetes Clusters?
Kind also referred to as Kubernetes in Docker is a popular open-source tool used for running a Kubernetes cluster locally. Kind uses Docker containers as Cluster Nodes making it substantially faster than its alternatives like Minikube or Docker Desktop which uses a Virtual Machine. Kind is commonly
10 min read
How To Use Kubernetes Secrets As Files In Containers ?
Secrets are Objects in Kubernetes that are used to store the data and credentials that we would not want to share with others. Secret is a Kubernetes component just like Configmap but the difference is that it's used to store secret data credentials and it stores this data not a plain text format bu
8 min read
How To Use Docker Desktop To Deploy Kubernetes Clusters ?
Docker Desktop has revolutionized local development environments by seamlessly integrating Kubernetes, a powerful container orchestration platform. This article gives an idea of how to use Docker Desktop to deploy Kubernetes clusters, which also includes step-by-step instructions and troubleshooting
6 min read
How to Upgrade a Kubernetes Cluster?
Kubernetes, the open-source field orchestration platform, is continuously evolving to satisfy the demands of modern applications. Regularly upgrading your Kubernetes cluster is essential to leverage new capabilities, safety patches, and overall performance enhancements. In this article, we can dive
10 min read
How to Deploy Kubernetes on AWS?
Kubernetes, the open-supply box orchestration platform, has emerged as the solution for dealing with containerized applications. When deploying Kubernetes in the cloud, Amazon Web Services (AWS) gives a robust and scalable environment. In this manual, we can walk you through the manner of deploying
7 min read
How To Stop Pod In Kubernetes ?
In the Kubernetes cluster stopping a pod is typically deleting the pod. Pods are meant to be ephemeral, meaning they can be created, destroyed, and replaced dynamically as needed. When the pod is deleted in Kubernetes the resources associated with the pod will be released. What Is Kubernetes Pod? In
5 min read
How to Run curl Command From Within a Kubernetes Pod?
Ever needed to fetch data or test connectivity from within a Kubernetes pod? The curl command becomes your trusty companion in these situations. This guide explores different approaches to execute curl commands directly from within your Kubernetes pod, empowering you to diagnose issues and interact
3 min read
How To Deploy Dashboard UI On A Kubernetes Cluster Locally ?
Docker Desktop is easily the most popular containerization software for developers and teams. But Docker Desktop would not only allow you to play with containers but can enable you to use Kubernetes without downloading any external cluster like Minikube. Docker Desktop is a secure, out-of-the-box co
6 min read