Horizontal Pod Autoscaling in Kubernetes based on External Metrics, using Prometheus Adapter

Orchestrators manage modern-day micro-service architectures. Kubernetes is one of them, which provides benefits of resource optimization, minimal or zero downtime deployments, reliability, auto-scaling, to name a few. Auto-scaling solutions are feedback loop based on specific metrics like network throughput, resource utilization of the services. Generally, metrics can be traffic throughput, resource utilization like CPU/Memory of the services. These metrics are part of the cluster and monitored to take auto-scaling decisions, but what about the external metrics? This blog covers both kinds of metrics for deploying the auto-scaling solution and used in production for a client.

One of our clients was using a Redis server which was outside of the Kubernetes cluster. We had to collect the metrics of the Redis queues and based on threshold auto-scale the pods.

What is Horizontal Pod Autoscaling (HPA)?

Kubernetes is inherently scalable, providing a number of tools that allow the applications as well as the infrastructure to scale up and down depending on the demand, efficiency and a number of other metrics. What I’m going to discuss in this article, is one such feature that allows the user to horizontally scale the Pods based on certain metrics, which can either be provided by Kubernetes itself, or custom metrics which have been generated by the user.

The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on some metrics. It is implemented as a Kubernetes API resource and a controller.

The HPA controller retrieves metrics from a series of APIs, which include:

How are we going to implement HPA?

For this article, we will be using the Prometheus Adapter in order to have the Prometheus metric available to the Kubernetes cluster as an external metric.

The following steps outline how HPA can be implemented in the cluster:

As can be seen above, the HPA configuration has been applied to the cluster and the HPA controller is able to access the external metric correctly. It will monitor the value of the external metric to the threshold’s value, and when it crosses the threshold, will trigger a scale up action. Similarly, when the external metric’s value goes below the threshold, the HPA controller will trigger a scale down action.

The HPA controller keeps a track of desired number of Pods based on the following formula:

  desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

Test the HPA

In order to test our HPA configuration and make sure that the scaling up/down occurs correctly, we will update the value of the trigger_prod_hpa metric to a value above the threshold.


Discussion and feedback