Comparative Analysis of HAproxy
Comparative Analysis of HAproxy
1 Introduction
1.1 Load Balancing
In a web system, the load balancer is usually the first component of the ar-
chitecture to interact with incoming user requests. Load balancers are used to
distribute user requests to application servers that can compute and generate
the response. Two major goals of the load balancer in a web system are: A) To
maximize the overall resource utilization; and B) To minimize the time it takes
for each user to receive a response to their request. These two goals directly
affect the QoS of a web application. To attain these goals, the load balancer uses
load balancing methods (algorithm) to distribute the user requests to compute
machines. One study conducted by Google suggests that after a request’s first 3
initial seconds, the probability that a user will leave the web application is 32%
[6]. In the case where the compute servers simply do not have enough resources
to handle the incoming load of requests, the system should be scaled out to
accommodate these load surges such that the application QoS is not violated.
Cloud providers [4], [5] offer services to automatically scale system resources
under developer-defined conditions, such as when the CPU utilization threshold
2 C. Rawls et al.
1.2 HAProxy
The idea of using load balancer in distributed systems has a long history and has
been studied in various contexts, such as those for Grid computing [21,24] and
cloud [20] or for different applications [7]. All these load balancers can be broadly
categorized as Network Load Balancer (NLB) and Application Load Balancer
(ALB). The load balancing software used in this paper’s study is HAProxy.
HAProxy is an open-source load balancer meant to be as stateless as possi-
ble while maintaining high throughput of messages per second. Additionally,
HAProxy can be configured to be used in both Network Load Balancing and
Application Load Balancing contexts.
Network Load Balancer (NLB): NLBs describe load balancers that operate solely
at the transport level (i.e., TCP). More specifically, NLBs operate at the Open
Systems Interconnection (OSI) Layer 4, highlighted in Fig. 1. This figure high-
lights the separate levels of information contained in the common communication
packet exchanged between two agents. Network load balancing is not concerned
with the intricacies of the messages it is handling, such as their content, head-
ers, etc. Furthermore, NLBs do not consider the behavior of backend servers in
its decision making. Instead, NLBs only consider transport-related information
when routing messages and that is why they are often faster than the ALB
algorithms.
Application Load Balancer (ALB): Opposing NLBs, ALBs do consider the con-
tents of the messages it is routing when making decisions. From a technical
standpoint, this means that ALBs operate at OSI Layer 7 (ex. HTTP), as shown
in Fig. 1. In HTTP scenarios, this means that an ALB routing algorithm may
consider application layer fields such as the request method or the URL re-
quested. For example, an ALB might route requests whose messages contain
Title Suppressed Due to Excessive Length 3
certain header values to one specific backend server. Additionally, ALBs can po-
tentially include the state information of backend servers in their logic, hence,
are often more efficient than their NLB counterparts.
In terms of HAProxy, the mechanism for enabling either NLB or ALB routing
is dependent on the load balancing algorithm chosen before runtime. HAProxy
comes with several builtin load balancing algorithms that are commonly used in
production, such as round-robin, least-connection-based, and random. It is im-
portant for the solution architects to choose the appropriate load balancing al-
gorithm (method) based on the characteristics of the system they are deploying,
as each possesses different behaviors depending on the workload. The algorithms
that HAProxy provides can be contextual in either an ALB or NLB scenario.
For example, HAProxy supports URL hashing to ensure that specific paths on
one’s website is always directed to the same server(s). This algorithm can be
considered ALB-based, as the logic examines the intricacies of the message itself
as well as the state of the backend servers to make decisions. On the other hand,
the traditional round-robin algorithm that HAProxy provides chooses backend
servers in order, hence, is considered as NLB-based.
Seeing the complexity and prevalence of load balancing and HAProxy, the
purpose of this paper is to take a deep dive into HAProxy and provide insight
on its inner workings. In a production environment, a minor improvement in
4 C. Rawls et al.
HAProxy can have substantial impacts on the user satisfaction and the incurred
cost of deploying an application.
In summary, the contributions of this work are as follows:
Fig. 2. Architecture of general three-tier web system. The different shapes passing into
and out of the load balancer represent different task types.
In web application deployments, the system can be broken down into three
separate tiers: load balancing, application, and database. An overview of such an
architecture can be observed in Fig. 2. The load balancing tier accepts incoming
user requests. With the load balancing algorithm, an appropriate back-end server
is determined from a list of possible servers and the request is then dispatched
to the application. The application tier’s purpose is to satisfy the computational
Title Suppressed Due to Excessive Length 5
workload that the request brings. The application machines rely on data that is
present in the database tier. With incoming requests, the application tier queries
the database tier for information used to handle its workload. Once a request has
completed its execution in the application tier, the server sends a response to the
load balancer to ultimately be returned to the client, completing the transaction.
Web requests are often user-facing, in that there is some deadline the re-
sponse must meet. The deadline can be considered a concept developed between
the inter-client/company relation. It has been shown that as the web response
time grows, the satisfaction of clients begins to drop linearly [17]. In the cir-
cumstance of website hosting, this may result in a loss of traffic and, in turn,
company profits. Other situations in which communication is considered mission
critical, such as in healthcare environments, slow response times may result in
catastrophic failure. Therefore, there is a call for ensuring the efficient load bal-
ancing of requests to minimize response times and make sure that the requests
are served within that time.
Fig. 3. HAProxy internals, generalizing the main mechanisms used to load balance
incoming user requests to application servers.
6 C. Rawls et al.
The algorithm used in load balancing is critical to the behavior of the load
balancer itself. Each algorithm may exhibit significantly different performances
in terms of response times and error rates. Additionally, choosing the correct
algorithm may prove difficult for non-technical users. As such, it is important to
have a general understanding of in which scenarios to use particular algorithms.
The load balancing algorithms supported by HAProxy can fall into two cat-
egories: ALB or NLB. This categorization can be viewed in Fig. 4. We will
discuss each algorithm in due order but first, we will make note of an important
mechanism that HAProxy utilizes in a few of its algorithms.
Under some scenarios, such as in heterogeneous environments/workloads,
certain application servers may wish to be prioritized or considered more heavily
in load balancing decisions. As such, HAProxy makes use of a weighting mechanic
for each server. What this tool provides is a way for the load balancing algorithm
to make conditional decisions based upon a server’s priority in relation to the
other servers. For example, a server more heavily weighted generally signifies that
the algorithm prefers to dispatch requests to this particular server in comparison
to other servers that are not weighted as high. By default, HAProxy sets the
weight of all servers to the same static value of 1.
Fig. 4. All algorithms supported by HAProxy, divided into NLB-based and ALB-based
logic.
4.1 Random
Random, otherwise known as Power of Two [22], randomly pulls two servers from
the list of possible servers. From these two servers, the algorithm chooses the
server with the least current load (connections). This algorithm can further be
adjusted to support Power of N where N is any positive integer. One can expect
Title Suppressed Due to Excessive Length 7
4.2 First
With the first algorithm, an incoming request is dispatched to the first server
possessing an available connection slot. The selection of possible backend servers
are treated as a list or pool. This list is sorted based upon each server’s id, which
is some value designated by a system administrator. Upon a routing decision,
HAProxy will select the previously used server from the list (in the case of the
first request during the application’s startup, this is the first server in the list).
HAProxy will continue to route incoming requests to this same server until the
server’s designated max connection value is reached. From here, HAProxy will
then send requests to the server that is next in line.
This algorithm may prove useful for utilizing the smallest amount of servers
possible, maintaining low operational costs. However, in times in which a server
is approaching its max connection value, the tasks in execution are likely to
suffer. This is due to the numerous tasks competing for system resources.
HAProxy’s least connection algorithm is based upon the connection state of each
server. In dispatching an incoming request r, out of the list of possible servers,
r will be routed to the server with the current least number of connections. If
two or more servers possess the same number of connections, the round-robin
algorithm is used to determine between this subgroup.
4.4 Source
Under the source algorithm, the client’s IP of the incoming request is hashed uti-
lizing the sum weight of all of the running servers. Utilizing this hash, HAProxy
will dispatch the request accordingly. The consideration of the sum weight means
that future requests from the same clients will always be routed to the same
servers. However, these mappings would change given that a server joins or
leaves the backend. Consequently, most clients would then be routed to a sepa-
rate server.
4.5 Round-robin
4.6 Static RR
4.7 URI
4.8 Header
The RDP Cookie algorithm examines the name in the RDP cookie for its load
balancing decisions. This value is hashed and assigned to a corresponding server.
This method ensures that returning clients will continuously be assigned to the
same server.
With the URL Parameter algorithm, the query string of each request is used
for the hashing algorithm. If no query was found in the request, this algorithm
resorts to the round-robin algorithm.
This algorithm may prove useful for ensuring that returning clients will be
routed to the same server, given that a server has not left or entered the pool of
possible servers since their last request.
5 Fine-Tuning HAProxy
While utilizing HAProxy out-of-the-box may prove applicable under certain sce-
narios, it is important to maximize the performance of load balancers under
most cases. In the context of resource scaling, additional resources will be ini-
tiated or spawned upon meeting some insufficient performance metric such as
excessive response times of requests or high CPU utilization of compute servers.
This scaling increases performance in times of increased load but comes with an
increased operational costs. To minimize the need to scale out, maximizing the
efficiency of your current system is imperative.
Parameter Note
nbproc Number of processes
nbthread Number of processing threads
cpu-map Designate specific CPU cores for
specific threads to process on
maxconn Maximum number of concurrent
connections HAProxy will allow
busy-polling Prevents processor from sleeping
during idle periods
compression Compresses HTTP messages
spread-checks Spread out health checks to servers
instead of sending all at once
each simultaneously can significantly decrease the overall computational time re-
quired to load balance client requests. However, there are certain drawbacks that
should be considered upon implementing these forms of parallelization. For one,
the nbproc directive does not support data sharing between processes. To com-
bat this issue, nbthread could be used in lieu. Additionally, HAProxy uses health
checks to obtain state information on the backend servers. This means that a
dummy request is periodically sent to the backend servers. With nbproc, each
process will send its own health checks, resulting in increased network traffic.
Lastly, increasing the thread count beyond reason is detrimental to performance.
If the number of threads in execution represent a pool of workers that exist in
an environment that can not adequately provide enough CPU time, the result-
ing contention (CPU thrashing) will lead to each thread possessing less time to
compute. Therefore, there exists a balance for allocating the proper number of
processes/threads.
To have an even more granular control on HAProxy’s processing, one can
utilize cpu-map. This directive allows users to control which process executes on
which CPU core. Essentially, the designated process(es) will always execute on
the designated CPU core.
There exists many other such tuning parameters such as tune.bufsize, which
alters the amount of memory each process is allocated or nosplice, which dis-
ables the kernel’s ability to perform TCP socket splicing. However, HAProxy’s
documentation suggests that enabling/changing these parameters may cause
buggy behavior or even result in the communication of corrupted data. As such,
HAProxy also recommends that these parameters not be touched outside of
their own core development team or under very specific scenarios. As our exper-
iments are meant to remain representative of common use-case environments,
these specialized parameters will not be explored in this work.
Title Suppressed Due to Excessive Length 11
Fig. 5. Design of our experimental system [3], [9], [1], [13], [11].
composed of only two task types: GET and POST. In the context of HTTP
web requests, GET tasks are used by users to fetch information from the web
service, such as a web page or image. POST tasks are used by users to send
information to the web service, such as posting a comment or image to the
web page. Being that a long sequence of POST requests would cause our web
pages to become bloated or change over time, consequentially, sequential requests
would be affected. Therefore, we have implemented an instrument to immediately
remove POST request content from the web page as soon as they are processed,
leaving the web page unchanged from the time of its initial creation.
The scenarios used for testing vary in the request rate that is provided to the
system. The first scenario’s request rate was approximately 16.67 users/second
over a period of 60 seconds for a total of 1,000 requests. We then ran similar tests
with differing request rates of over the same period providing a list of total re-
quests: [5,000:40,000] with increments of 5,000 total requests. The total amount
of requests can be partitioned between both task types, however, requests in-
stances of both task types were sent simultaneously. The input workload scenario
configurations can be observed in Table 3.
As discussed earlier, getting the most out of our current load balancer is useful
for decreasing system costs and maximizing system performance. For this exper-
iment, we measure the performance of various configurations of HAProxy. Being
that HAProxy processes many possible parameters, each with a large amount of
possible parameter values, the combinatorial search space to examine all configu-
rations is time prohibitive. Hence, we examine the number of processing threads
Title Suppressed Due to Excessive Length 13
Fig. 6. Performance results from tuning processing threads parameter. The dashed red
line indicates the deadline tasks should be meeting.
14 C. Rawls et al.
Fig. 7. Performance results from tuning the load balancing algorithm. Due to trend
overlaps, the following algorithms are stacked: [fist, source], [random, leastconn, static-
rr, roundrobin], [uri].
From the performance results, we can observe that most algorithms appear
to be impartial to the task types in our given workload. URI, however, exhibits
different performance for each task type. For example, URI shows the best per-
formance for GET requests while exhibiting the worst performance for POST
requests. The likely cause for this behavior is due to the URI algorithm directing
all GET requests to the same servers while sending POST requests to others.
In this scenario, URI is making a task-to-server partitioning scheme that is re-
sulting in a loss of overall performance. Alternatively, the other algorithms send
a mixture of both task types to all servers. It can be speculated that if our
environment consisted of a set of servers to better accommodate the resources
necessary to respond to the POST requests, the URI algorithm would showcase
the best performance or both task types. However, being that our backend is
homogeneous, the potential for URI’s logic can not be fully utilized. Additional
to this notable feature, the first and source algorithms seem to perform worse
for GET tasks. Comparatively, the static-rr, random, roundrobin, and leastconn
algorithms are more robust to task types as these algorithms do not exhibit
much of a performance change as the number of requests increase between task
types.
From the results of the load balancing algorithm experiment, it is prevalent
that some ALB algorithms may require further system tuning to receive the full
benefits they bring in terms of request response time. However, while choosing
16 C. Rawls et al.
an NLB might appear to be a safer, more general alternative, there are still
significant performance discrepancies between them.
Best Practice Choose random, leastconn, roundrobin, or static-rr for load bal-
ancing under general circumstances. Choose URI for potential performance in-
crease at the cost of system profiling/further resource tuning.
Load balancing algorithms source and URI both use information from incom-
ing tasks to partition workloads to application servers in a manner that could be
exploited by server heterogeneity. However, from these results, it can be claimed
that URI can potentially show better performance at the cost of reliability in
terms of task type while source shows better robustness to task task at the cost
of response time. One potential explanation for this dynamic may lie within
the differences in task characteristics NLB and ALB algorithms observe. NLB
algorithms only examine network-level information about a given task, such as
the user’s IP address. This kind of information is not particular to a task and
fails to provide important features that describes an incoming workload. On the
other hand, ALB algorithms examine more granular information about a task,
such as the page requested. Under the guise of an ecommerce store, particular
pages may be more object-rich. From this example, an ALB algorithm will be
able to determine the ”weight” of an incoming task and better act accordingly,
as compared to NLB algorithms.
18 C. Rawls et al.
Utilizing the available options HAProxy provides may prove suitable for many
general use cases. However, it may be desired to further tune HAProxy’s perfor-
mance, we design custom load balancing algorithms to cater to specific use cases.
Being that HAProxy is open-source, we can easily dive deep into its source code
to edit the way an algorithm behaves or add our own custom algorithm. In this
section, we will describe an example scenario for customizing a specific algorithm
while highlighting important aspects of HAProxy’s code. We will be referring to
files and directories that can be found on HAProxy’s GitHub repository [8].
In our example, we will be customizing the load balancing algorithm random.
We would like to inject additional information into its logic to consider for load
balancing incoming requests. The current method provides ample task dispersion
so as to not “overcrowd” one specific server. For the sake of illustration, in our
example, we desire to further increase the granularity of the equality of resource
partitioning.
The load balancing algorithm that is desired to be edited is called from the
file backend.c in HAProxy’s src/ directory. The assign_server function is
used to parse which algorithm the user has decided from their configuration file
and call the algorithm itself. Additional to this, the HTTP request is passed as
an object possessing various characteristics such as URL paths and headers. In
our case, the get_server_rnd function is called. Located in this function is the
core of the load balancing logic.
For our example, we capture CPU utilization information from our appli-
cation servers using the libvirt library [10]. We implement a custom library to
capture this information remotely and embed it into HAProxy’s code. The cur-
rent CPU utilization of each server is recorded in real-time and called from
get_server_rnd before making its final decisions. Specifically, the random al-
gorithm will only pull from a list of possible servers whose CPU utilization lies
below a user-specified threshold. From here, we let the default random algorithm
take over and return the selected server.
When adding custom libraries, it is necessary to include them in the Makefile.
Specifically, the OBJS variable must include the new object file that is desired to
be created from the new header file.
HAProxy’s code is complex and should be respected when adding addi-
tional content. To avoid any potentially unwanted behaviors or errors, leaving
HAProxy’s code as default as possible is a good measure. In our example, we seg-
regate our custom logic as much as possible from HAProxy’s original logic. We
attempt to minimize the association that our code holds with HAProxy and let
Title Suppressed Due to Excessive Length 19
it perform its own actions where it can. Customizing with this idea in mind will
prevent any unforeseen consequences should an unassuming variable or block of
code is interacted with, causing a snowball effect somewhere else in HAProxy’s
dense architecture.
Once the code has been altered and it is time to compile, there is a list of
options to choose from such as lua version, multithreading support, and target
compiler. A list of these options can be found in the Makefile. In our exam-
ple case, we include the additional libvirt library to support the CPU utilization
readings. To compile additional libraries that HAProxy would otherwise be unfa-
miliar with, the compile command must include the options ADDINC and ADDLIB
for acknowledging the library path and adding the library to the list of libraries
to be compiled, respectively. Here is our full command to compile with the in-
cluded libvirt library:
make -j \$(nproc) TARGET=linux-glibc USE\textunderscore LUA=1
,→ LUA\textunderscore INC=/opt/lua-5.4.3/src/ LUA\textunderscore
,→ LIB=/opt/lua-5.4.3/src/ ADDINC=-L/opt/libvirt ADDLIB=-lvirt
From here, one can simply make install and then run the customized HAProxy
program.
drop task requests that are unlikely to meet their deadlines. Such a method
can have at least two benefits: (A) coping with the oversubscribed situation and
make the system busy for tasks with a higher chance of meeting their deadlines;
and (B) handling Denial of Service (DoS) attacks and proactive drop (ignore)
tasks that are artificially generated to make the system unresponsive.
References
19. Xiangbo Li, Mohsen Amini Salehi, and Magdy Bayoumi. Vlsc: Video live streaming
based on cloud services. In Proc. of Big Data & Cloud Applications Workshop, as
part of the 6th IEEE International Conference on Big Data and Cloud Computing
BDCloud, volume 16, 2016.
20. Xiangbo Li, Mohsen Amini Salehi, Magdy Bayoumi, Nian-Feng Tzeng, and Rajku-
mar Buyya. Cost-efficient and robust on-demand video stream transcoding using
heterogeneous cloud services. IEEE Transactions on Parallel and Distributed Sys-
tems (TPDS), 29(3):556–571, Mar. 2018.
21. Hossain Deldari Mohsen Amini Salehi. Grid load balancing using an echo-system of
intelligent ants. In Proccedings of the 24th IASTED International Multi-Conference
Parallel and Distributed Computing and Networks, 2006.
22. Andrea W Richa, M Mitzenmacher, and R Sitaraman. The power of two random
choices: A survey of techniques and results. Combinatorial Optimization, 9:255–
304, 2001.
23. Mohsen Salehi and Rajkumar Buyya. Adapting market-oriented scheduling poli-
cies for cloud computing. In Algorithms and Architectures for Parallel Processing,
volume 6081 of ICA3PP’ 10, pages 351–362. Springer Berlin / Heidelberg, 2010.
24. Mohsen Salehi, Hossain Deldari, and Bahare Dorri. MLBLM: A Multi-level Load
Balancing Mechanism in Agent-Based Grid. In Soma Chaudhuri, Samir Das, Hi-
madri Paul, and Srikanta Tirthapura, editors, International Conference Distributed
Computing and Networking, volume 4308 of ICDCN ’06, pages 157–162. Springer
Berlin / Heidelberg, 2006.