Unit-5 CC
Unit-5 CC
Syllabus: Unit-4
Cloud Resource Management and Scheduling: Policies and mechanisms for resource
management, Applications of control theory to task scheduling on a cloud, Stability of a two-
level resource allocation architecture, Feedback control based on dynamic thresholds,
Coordination of specialized autonomic performance managers, A utility-based model for cloud-
based web services, Resource bundling, combinatorial auctions for cloud resources, Scheduling
algorithms for computing clouds, fair queuing, Start time fair queuing, Cloud scheduling subject
to deadlines.
A policy typically refers to the principal guiding decisions, whereas mechanisms represent the
means to implement policies. Separation of policies from mechanisms is a guiding principle in
computer science.
Cloud resource management policies can be loosely grouped into five classes:
1. Admission control.
2. Capacity allocation.
3. Load balancing.
4. Energy optimization.
5. Quality-of-service (QoS) guarantees.
The explicit goal of an admission control policy is to prevent the system from accepting
workloads in violation of high-level system policies; for example, a system may not accept an
additional workload that would prevent it from completing work already in progress or
contracted.
Limiting the workload requires some knowledge of the global state of the system. In a dynamic
system such knowledge, when available, is at best obsolete. Capacity allocation means to
allocate resources for individual instances; an instance is an activation of a service. Locating
resources subject to multiple global optimization constraints requires a search of a very large
search space when the state of individual systems changes rapidly.
Load balancing and energy optimization can be done locally, but global load-balancing and
energy optimization policies encounter the same difficulties as the one we have already
discussed.
Load balancing and energy optimization are correlated and affect the cost of providing the
services. Indeed, it was predicted that by 2012 up to 40% of the budget for IT enterprise
infrastructure would be spent on energy.
The common meaning of the term load balancing is that of evenly distributing the load to a set of
servers. For example, consider the case of four identical servers, A B C , , , and D , whose
relative loads whose relative loads are 80% 60% 40%, and 20%, respectively, of their capacity.
As a result of perfect load balancing, all servers would end with the same load - 50% of each
server’s capacity.
In cloud computing a critical goal is minimizing the cost of providing the service and, in
particular, minimizing the energy consumption. This leads to a different meaning of the term
load balancing instead of having the load evenly distributed among all servers, we want to
concentrate it and use the smallest number of servers while switching the others to standby
mode, a state in which a server uses less energy.
In our example, the load from D will migrate to A and the load from C will migrate to ; thus, B A
and B will be loaded at full capacity, whereas C and D will be switched to standby mode.
Quality of service is that aspect of resource management that is probably the most difficult to
address and, at the same time, possibly the most critical to the future of cloud computing.
Allocation techniques in computer clouds must be based on a disciplined approach rather than
adhoc methods.
The four basic mechanisms for the implementation of resource management policies are:
1. Control theory: Control theory uses the feedback to guarantee system stability and predict
transient behavior but can be used only to predict local rather than global behavior.
2. Machine learning: A major advantage of machine learning techniques is that they do not
need a performance model of the system. This technique could be applied to coordination
of several autonomic system managers.
3. Utility-based: Utility-based approaches require a performance model and a mechanism to
correlate user-level performance with cost.
4. Market-oriented/economic mechanisms: Such mechanisms do not require a model of the
system, e.g., combinatorial auctions for bundles of resources
Control theory has been used to design adaptive resource management for many classes of
applications, including power management, task scheduling, QoS adaptation in Web servers ,and
load balancing.
The classical feedback control methods are used in all these cases to regulate the key operating
parameters of the system based on measurement of the system output; the feedback control in
these methods assumes a linear time-invariant system model and a closed-loop controller.
This controller is based on an open-loop system transfer function that satisfies stability and
sensitivity constraints.
The technique allows multiple QoS objectives and operating constraints to be expressed as a cost
function and can be applied to stand-alone or distributed Web servers, database servers, high-
performance application servers, and even mobile/embedded systems.
A server with a closed-loop control system and can apply control theory principles to resource
allocation.
Allocation architecture based on control theory concepts for the entire cloud. The automatic
resource management is based on two levels of controllers, one for the service provider and one
for the application, see Figure 6.2
The main components of a control system are the inputs, the control system components, and the
outputs. The inputs in such models are the offered workload and the policies for admission
control, the capacity allocation, the load balancing, the energy optimization, and the QoS
guarantees in the cloud.
The system components are sensors used to estimate relevant measures of performance and
controllers that implement various policies; the output is the resource allocations to the
individual applications .
The controllers use the feedback provided by sensors to stabilize the system; stability is related
to the change of the output. If the change is too large, the system may become unstable.
The elements involved in a control system are sensors, monitors, and actuators. The sensors
measure the parameter(s) of interest, then transmit the measured values to a monitor , which
determines whether the system behavior must be changed, and, if so, it requests that the actuators
carry out the necessary actions.
• First, due to the very large number of servers and to the fact that the load changes rapidly
in time, the estimation of the current system load is likely to be inaccurate.
• Second, the ratio of average to maximal resource requirements of individual users
specified in a service-level agreement is typically very high.
Thresholds: A threshold is the value of a parameter related to the state of a system that triggers a
change in the system behavior. Thresholds are used in control theory to keep critical parameters
of a system in a predefined range.
The threshold could be static , defined once and for all, or it could be dynamic . A dynamic
threshold could be based on an average of measurements carried out over a time interval, a so-
called integral control.
The dynamic threshold could also be a function of the values of multiple parameters at a given
time or a mix of the two. To maintain the system parameters in a given range, a high and a low
threshold are often defined.
The two thresholds determine different actions; for example, a high threshold could force the
system to limit its activities and a low threshold could encourage additional activities.
Proportional Thresholding.
proportional thresholding
(1) application controllers that determine whether additional resources are needed and
(2) cloud controllers that arbitrate requests for resources and allocate the physical
resources?
(3) Is it feasible to consider fine control?
(4) Are dynamic thresholds based on time averages better than static ones?
(5) Is it better to have a high and a low threshold, or it is suf?cient to de?ne only a high
threshold?
1. Compute the integral value of the high and the low thresholds as averages of the
maximum and, respectively, the minimum of the processor utilization over the process
history.
2. Request additional VMs when the average value of the CPU utilization over the current
time slice exceeds the high threshold.
3. Release a VM when the average value of the CPU utilization over the current time slice
falls below the low threshold.
4. The conclusions reached based on experiments with three VMs are as follows:
Virtually all modern processors support dynamic voltage scaling (DVS) as a mechanism for
energy saving. Indeed, the energy dissipation scales quadratically with the supply voltage.
The management controls the CPU frequency and, thus, the rate of instruction execution. For
some compute-intensive workloads the performance decreases linearly with the CPU clock
frequency, whereas for others the effect of lower clock frequency is less noticeable or
nonexistent. The clock frequency of individual blades/servers is controlled by a power manager,
typically implemented in the firmware; it adjusts the clock frequency several times a second.
The approach to coordinating power and performance management in is based on several ideas:
A utility function relates the “benefits” of an activity or service with the “cost” to provide the
service. For example, the benefit could be revenue and the cost could be the power consumption.
A service-level agreement (SLA) often specifies the rewards as well as the penalties associated
with specific performance metrics. Sometimes the quality of services translates into average
response time; this is the case of cloud-based Web services when the SLA often explicitly
specifies this requirement.
Resources in a cloud are allocated in bundles, allowing users get maximum benefit from a
specific combination of resources. Indeed, along with CPU cycles, an application needs specific
amounts of main memory, disk space, network bandwidth, and so on.
Resource bundling complicates traditional resource allocation models and has generated interest
in economic models and, in particular, auction algorithms.
Two recent combinatorial auction algorithms are the simultaneous clock auction and clock
proxy auction.
Pricing and Allocation Algorithms : A pricing and allocation algorithm partitions the set of users
into two disjoint sets, winners and losers, denoted as W and , respectively.
(a) The first one states that a user either gets one of the bundles it has opted for or nothing; no
partial allocation is acceptable.
(b) The second constraint expresses the fact that the system awards only available resources;
only offered resources can be allocated.
(c) The third constraint is that the bid of the winners exceeds the final price.
(d) The fourth constraint states that the winners get the least expensive bundles in their
indifference set.
(e) The fifth constraint states that losers bid below the final price.
(f) The last constraint states that all prices are positive numbers.
In the ASCA algorithm the participants at the auction specify the resource and the quantities of
that resource offered or desired at the price listed for that time slot. Then the excess vector
An auctioning algorithm is very appealing because it supports resource bundling and does not
require a model of the system. At the same time, a practical implementation of such algorithms is
challenging. First, requests for service arrive at random times, whereas in an auction all
participants must react to a bid at the same time. Periodic auctions must then be organized, but
this adds to the delay of the response. Second, there is an incompatibility between cloud
elasticity, which guarantees that the demand for resources of an existing application will be
satisfied immediately, and the idea of periodic auctions.
A server can be shared among several virtual machines, each virtual machine could support
several applications, and each application may consist of multiple threads.
CPU scheduling supports the virtualization of a processor, the individual threads acting as virtual
processors; a communication link can be multiplexed among a number of virtual channels, one
for each flow.
Scheduling algorithm should be efficient, fair, and starvation-free. The objectives of a scheduler
for a batch system are to maximize the throughput and to minimize the turnaround time
submission and its completion.
Schedulers for systems supporting a mix of tasks – some with hard real-time constraints, others
with soft, or no timing constraints – are often subject to contradictory requirements. Some
schedulers are preemptive, allowing a high-priority task to interrupt the execution of a lower-
priority one; others are nonpreemptive
Figure 6.7 identifies several broad classes of resource allocation requirements in the space
defined by these two dimensions: best-effort, soft requirements, and hard requirements. Hard-
real time systems are the most challenging because they require strict timing and precise amounts
of resources.
Round-robin, FCFS, shortest-job-first (SJF), and priority algorithms are among the most
common scheduling algorithms for best-effort applications. Each thread is given control of the
CPU for a definite period of time, called a time-slice , in a circular fashion in the case of round-
robin scheduling.
The algorithm is fair and starvation-free. The threads are allowed to use the CPU in the order in
which they arrive in the case of the FCFS algorithms and in the order of their running time in the
case of SJF algorithms. Earliest deadline first (EDF) and rate monotonic algorithms (RMA) are
used for real-time applications.
Fair queuing
Interconnection networks allow cloud servers to communicate with one another and with users.
These networks consist of communication links of limited bandwidth and
switches/routers/gateways of limited capacity. When the load exceeds its capacity, a switch starts
dropping packets because it has limited input buffers for the switching fabric and for the
outgoing links, as well as limited CPU cycles.
A fair queuing algorithm proposed in requires that separate queues, one per flow, be maintained
by a switch and that the queues be serviced in a round-robin manner. This algorithm guarantees
the fairness of buffer space management, but does not guarantee fairness of bandwidth
allocation. Indeed, a flow transporting large packets will benefit from a larger bandwidth (see
Figure 6.8 ).
A hierarchical CPU scheduler for multimedia operating systems was proposed in. The basic idea
of the start-time fair queuing (SFQ) algorithm is to organize the consumers of the CPU
bandwidth in a tree structure; the root node is the processor and the leaves of this tree are the
threads of each application. A scheduler acts at each level of the hierarchy. The fraction of the
processor bandwidth.
An SLA specifies the time when the results of computations done on the cloud should be
available. This motivates us to examine cloud scheduling subject to deadlines, a topic drawing on
a vast body of literature devoted to real-time applications.
1. hard deadlines
2. soft deadlines
In the first case, if the task is not completed by the deadline, other tasks that depend on it may be
affected and there are penalties; a hard deadline is strict and expressed precisely as milliseconds
or possibly seconds. Soft deadlines play more of a guideline role and, in general, there are no
penalties.
Soft deadlines can be missed by fractions of the units used to express them, e.g., minutes if the
deadline is expressed in hours, or hours if the deadlines is expressed in days. The scheduling of
tasks on a cloud is generally subject to soft deadlines, though occasionally applications with hard
deadlines may be encountered.
There are two rules, optimal partitioning rule and equal partitioning rule.