Hit-and-Run Tactics Enable Guerrilla Capacity Planning
Hit-and-Run Tactics Enable Guerrilla Capacity Planning
time to market.
Neil J. Gunther
new awareness, I discovered some unusual reasons managers had for avoiding capacity planning.Two of the most compelling have to do with misperceptions about risk.
Inside
40
time! As this article tries to explain,application development doesnt have to be like this.
Performance homunculus
A homunculus is the sensate representation of the human body. Reflecting this sensory weighting, the homunculus hands and mouth are huge, while its torso and head appear relatively small, as Figure 1 shows. This is because humans receive vastly more sensory information via ngers and tongues than via the skin on our torsos, for example. Just as the homunculus emphasizes the disproportionate weighting of our senses to our bodies, so too performance management (which draws on many skills and disciplines) should receive a disproportionately large part of system management investment. This is not, however, conventional wisdom. Performance management, most often, is seen as just another system management activity. In other words, the resources required to carry out performance management are roughly the same as those required for managing the distribution of software, backup and recovery, charge back, and security. But this is another misconception like the one involving risk.Almost without exception, its possible to accommodate most of the latter activities by purchasing appropriate COTS (commercial off-the-shelf) packages, which require little support in terms of human infrastructure. Because performance management in general (and capacity planning in particular) is like a homunculus, it requires disproportionately more resources, more human infrastructure, and more training than almost any other system management activity. Nonetheless, the plethora of COTS performance tools on the market continues to provide false assurance that performance management is just another limb on the body of system management, rather than the huge helping hands of the homunculus that it could be.
Capacity planning is more like the hands of the homunculus than simply part of the torso (systems management).
Given these observations, I began to think about explaining why managers were avoiding capacity planning in todays computer environments (Neil Gunther, Shooting the RAPPIDs:Swift Performance Techniques for Turbulent Times, CMG 97 Proc., Computer Measurement Group, Turnersville, N.J., 1997, pp. 602-613). I realized that
CAPACITY
PLANNING
evil for mainframe and data network procurement (Frank Little standardization or instrumentation Huebner, Performance and Capacity Evaluations of IP The Universal Measurement Architecture (UMA) stanNetworks and Systems, IT Professional, Nov.-Dec. 2001, dard from the Open Group (http://www.opengroup.org/ pp. 38-43). The motivation is simple: Hardware compo- pubs/catalog/c427.htm) could have helped surmount some nents have always been expensive, and budgets have of these difculties. This standard offers a means to noralways been limited. But today, hardwareeven main- malize collection and management tools for performance frame hardwarehas become relatively cheap. data (especially Unix data). In launching a new application and system, however, IT But vendors and other competing parties had no strong professionals must consider a less-obvious caution: incentive to adopt UMA. To provide performance data in Bottlenecks are more likely to arise UMA format, platform vendors in the application software than in would incur a cost burden without Think of traditional the hardware. So throwing more an obvious return on investment. capacity planning as hardware at a performance probTool vendors thought a standard lem might not necessarily help. the 800-pound gorilla. format made it easier for new comIn this sense, capacity planning petitors to enter the market, threatThat gorilla needs has not gone away. Time is money, ening their revenue streams. even if you have all the hardware in Similarly, ARM (Application to go on a diet. the world. The new emphasis is on Resource Measurement), a standard software scalability, and that primarily promoted by Hewlettemphasis affects the way you do capacity planning.Todays Packard and Tivoli, has not really caught on either. demands on time and scheduling constraints no longer So, todays IT professionals are building more complex accommodate the traditional detailed approach to capac- architectures with less instrumentation available to manity planning based on procurement for the monolithic age them. Im glad Boeing doesnt build aircraft this way! mainframe. Moreover, todays computer systems employ distributed computing architectures. Distributed comput- GETTING STARTED WITH CAPACITY PLANNING ing comprises many software pieces running in many By now it should be clear that capacity planning is as hardware places. important as ever, but capacity planners today must surmount two new hurdles that traditional planners did not:
Were trying to do more with less. Distributed systems are ubiquitous and more complex than monolithic mainframes, but we have fewer standardized tools to manage performance and plan for growth. We have to do more in less time.Whatever performance management and capacity planning we do, it must be done in less time so as not to inflate product/project schedules. Taken together, these constraints seem impossible to meet. Is it any wonder that managers give up and ignore capacity planning? But, as you will see, there is a way out. First, its important to realize that performance management has three major levels: monitoring, analysis, and planning. Managers usually pay the most attention to performance monitoring, because it is generally the easiest area to address. If you want to manage performance and capacity, you must measure it. Naturally, this is the activity that the majority of commercial tool vendors target. (As a manager, if you spend $250,000 on tools, you inevitably feel like you must have accomplished something.) If you cant afford these prices, you can turn to your Unix or Windows NT system administrator. They are usually very good at writing scripts to collect all sorts of performance data and presenting it to you in your Web browser.
But data collection just generates data.The next level is analysis: uncovering the information hidden in the data.These days, unfortunately, the usual motivation for performance analysis is to reght an unforeseen performance problem that is delaying a release schedule or reducing deployed functionality.This is performance analysis after the horse has bolted. With a little more investment in infrastructure, managers can plan ahead, minimizing the outbreak of those unforeseen res. But, as was explained earlier, managers usually skip this third level of planning for fear of inflating project schedules. How can managers cut this Gordian knot?
Guerrilla Resources
Books
Computer Performance Analysis with Mathematica, A. Allen, Academic Press, San Diego, 1994: Contains a very readable overview of capacity planning issues and techniques written by a master of the subject. Also, it is the only book I know of that shows you how to apply Mathematica to solve performance problems. Guerrilla Capacity Planning: Hit and Run Tactics for Sizing UNIX, Windows and Web Applications, N.J. Gunther: This book is in preparation; see http://www.perfdynamics.com for information.
In my view, managers need a more oppor Sun Blueprints: Capacity Planning for Internet Services, A. tunistic approach to capacity planning; enter Cockcroft and W. Walker, Prentice Hall, Upper Saddle River, guerrilla capacity planning. N.J., 2000: One of these authors has adopted and extended The notion of planning tactically might many of the concepts I present in this article. The book also seem contradictory. At the risk of mixing outlines guerrilla-style management processes and scenario metaphors, think of traditional capacity planplanning; some chapters are Sun/Solaris specic. ning as the 800-pound gorilla. That gorilla The Web Testing Handbook, Steven Splaine and Stefan P. needs to go on a diet to produce a leaner Jaskiel, STQE Publishing, Orange Park, Fla., 2001: This book approach to capacity planning that is adapted covers the interface between functional testing and performto todays business environment. By lean, I ance measurement. Chapter 8 covers application scalability dont mean skinny. Skinny means remainand, in particular, the retrograde throughput effect discussed ing stuck in the rut of simply monitoring in example 1 of this article. everything that moves with the false hope that capacity issues will never arise, and thus Classes avoids the planning level altogether. Guerrilla Planning (http://www.perfdynamics.com/PitV/ Monitoring requires that someone watch the guerilla.html): I teach a ve-day class on these techniques. meter needles wiggle. Inherent in this Guerrilla Tools (http://www.perfdynamics.com/PitV/gtk.html): approach is the notion that action isnt necThis ve-day class is a lab in which students participate in the essary unless the meter redlines. But perconstruction of performance and capacity models. formance meters only convey the systems current state. Such a purely reactive approach does not provide any means for forecasting what lies ahead. You cant forecast the weather by listen- that includes not just a scale model of the locomotive, not ing to leaves rustle. just a model of an engineer driving the scaled locomotive, The irony is that a lot of predictive information is likely but the one that includes the pupil painted on the eyeball contained in the collected monitoring data. But, like pan- of the engineer driving the scaled locomotive. In other ning for gold, it takes additional processing to reveal the words, the more detail the better. hidden gems about the future. This is precisely what a capacity planning model is not. For capacity planning, the goal is to discard as much detail as possible while still retaining the essence of the systems Its not a model railway The goal of capacity planning is prediction and that performance characteristics.This goal tends to argue against requires a consistent framework in which to couch any the construction and use of detailed simulation models, in assumptions. Capacity planners call that framework a favor of spreadsheets or even automated forecasting. The model. The word model, however, is one of the most skill lies in nding the correct balance. Linear trend models overloaded terms in English; it can mean everything from can be too simple in many cases, while event-based simulation models can be overkill. To paraphrase Einstein: Keep a fashion model to a model railway set. For example, the best model train set is usually the one the model as simple as possible, but no simpler.
July August 2002 IT Pro 43
CAPACITY
PLANNING
As a first example of how to use guerrilla capacity planning, here is a tactical method for quantitatively determining application 2,000 scalability. Its noteworthy that scalability 1,800 particularly application scalabilityis a 1,600 perennial hot button that involves notions of 1,400 performance and planning, yet few people can 1,200 quantify the concept. Scalability has to do with the laws of dimin1,000 ishing returns so a single number cannot rep800 resent it. Scalability is a function. Figure 2 600 shows an example of actual load test through400 Modeled put plotted as number of scripts S executed Measured 200 per hour on the y-axis as a function of num0 18 36 54 72 90 108 126 144 162 180 198 216 234 ber of virtual users N on the x-axis. No. of virtual users Superimposed on this data is the corresponding scaling function predicted by a simple capacity model that does not involve queueing theory or simulations. This means that sizing No compass required Traditional capacity planning has required relatively server capacity can be relatively quick using a spreadsheet. The formula for the simple capacity model is high precision because each signicant digit of the calculation had many thousands of dollars attached to it. In todays lower-cost climate, however, managers often just S(, , N) = N/{1 + [(N 1) + N (N 1)]} want a sense of direction rather than the actual compass bearing. In this sense, the precision of traditional capacity It involves just two parameters (N.J. Gunther, The Practical predictions has become less important than their accuracy. Performance Analyst, iUniverse Inc., Lincoln, Neb., 2000). Managers often see little virtue in spending two months You identify with contention delays, for example, time debugging and verifying a full-blown simulation if the spent waiting on a database lock. is associated with addiaccuracy of a simpler spreadsheet model would sufce. tional delays due to pair-wise coherency mismatches, such At a level of performance data collection, there is little as time to fetch a cache miss. These delays can be in hardsupport for high-precision measurements in open systems. ware, software, or (most likely) a combination of both. Take Unix, for example: Its basically an experiment that You can easily enter this function in a Microsoft Excel escaped from the lab around 1975 and has been mutating spreadsheet. If the number of virtual users resides in colever since.What little performance instrumentation existed umn N, and regression parameters and reside in cells back then in the Unix kernel was for the benet of early A1 and B1, the equation becomes the Excel cell formula developers, not the grand purpose of capacity planning in todays distributed systems. Nonetheless, every capacity =Nn/(1+A1*((Nn-1)+B1*Nn*(Nn-1))) planning tool in existence relies primarily on those same kernel counters with little modication. And since the PC where Nn is the value in the cell at column N and row n. revolution of the 1980s fueled the move to distributed sys- You can now determine scalability parameters and tems, performance management has become ad hoc, at best. using the linear-regression tools built into Excel (see, for Guerrilla capacity planning, on the other hand, tries to example, D. Levine, M. Berenson, and D. Stephan, Statistics facilitate rapid forecasting of capacity requirements based for Managers Using Microsoft EXCEL, Prentice-Hall, on available performance data without inating schedules. Upper Saddle River, N.J., 1999). In summary, the basic Consistent investment in human infrastructure is also steps of this guerrilla technique are as follows: important. People must have the training and know-how to use the methods and tools at hand. Another key point Measure throughput as a function of load N using tools is to keep ahead in the procurement cycle. This was never like LoadRunner. truer than it is for exponentially growing Web sites. If you Collect a sparse data sample; having at least four load dont accurately forecast capacity, user trafc can disappoints is usually sufcient. pear to competitor sites. Or latent and unanticipated Calculate and by performing a regression t of Excel demand can consume newly procured capacity the instant against transformed versions of N and S. you install new servers. Lets look at two brief examples Use those values to predict the complete application scalof guerrilla capacity planning in action. ability function using the Excel formula discussed earlier.
44 IT Pro July August 2002
No. of scripts per hour
You can nd a more detailed account of this procedure at http://www.teamquest.com/ html/gunther/tting.shtml. An essential feature of this simple model is CSIM (Mesquite Software, http://www. that it can predict retrograde throughputs mesquite.com): Moderately priced simu(where the amount of completed computalation package. tion decreases as the system load increases) Excel (Microsoft, like that in Figure 2. This effect is not easily http://www.microsoft. com): modeled using conventional performance Spreadsheet in Microsoft modeling tools based on queueing theory Ofce suite. without specialized load-dependent servers. MathCad (MathSoft Engineering What are the benets of this guerrilla sizing & Education Inc., http://www. mathmethodology? Primarily, it avoids the need soft.com): Commercial symbolic-computation program. for more complicated queueing theory or sim Mathematica (Wolfram Research, http://www.wolfram.com): ulation models. However, the most signicant Commercial, general-purpose program for symbolic computabenet is not the models technical merits but tion. the fact that it creates a framework against Minitab statistical package (http://www.minitab.com): Next which to check the consistency of load measlevel above Excel. urements. If the data does not t the model in PDQ (Performance Dynamics Consulting, http://www. the rst equation, there is very likely a probperfdynamics.com/Tools/PDQ.html): An open-source perlem with the measurement process that may formance analyzer. be worth more detailed investigation. R (http://www.r-project.org): Public-domain version of S+ Moreover, because each of the models terms statistical-analysis product. has a real physical interpretation, engineers SPE *ED (Performance Engineering Services, http://www. from disparate groups quickly recognize perfeng.com/sped.htm): Tool for software performance engiwhich parts of the application or platform neering (SPE). need further tuning to improve scalability. In TeamQuest performance software (http://www.teamquest. this way, the spreadsheet model is a capacity com): Commercial capacity-modeling tools. planning tool that forecasts scalability without inating the release schedule. There are many other tools that can help in such a guerrilla analysis; the Guerrilla Tools sidebar period can be as short as six months.Thats about 10 times lists a few. faster than typical data processing centers and four times faster than the original version of Moores law. Such exponential demand for server capacity can lead Example 2: Estimating capacity doubling to a new denition of bankruptcyif you have to purchase time You can also apply the spreadsheet model to Web site a lot of cheap servers, pretty soon youre talking real trafc analysis, where the rapid increase in trafc growth money. Such costs force the need to plan for capacity well demands a more tactical approach to capacity planning. in advance of the procurement cycle. Once again, you can determine the capacity doubling The growth of sites like those of eBay,AOL, and Amazon might not be as extreme as it was last year (A.C. Lear, period by using elementary tools, like spreadsheets. If, for Managing e-Commerce Reliability, eBay Style, IT example, you measure processor utilization U at regularly Professional, Mar.-Apr. 2000, pp. 80, 77-79), but server scheduled intervals, you can estimate long-term condemand will still grow.As already noted, managers of these sumption by assuming an exponential trend Web sites know they need capacity; its the planning part Ufuture = Unow eW that is culturally unfamiliar. I have devised a useful metric for high-growth Web sites, the capacity doubling period (N.J. Gunther,Performance where growth rate is determined by using the Add and Scalability Models for a Hypergrowth e-Commerce Trendline facility in Excel, and W is the number of weeks Web Site, Performance Engineering: State of the Art and over which to t the data. Given this relationship, the douCurrent Trends, Lecture Notes in Computer Science 2047, bling time is Reiner Dumke and Claus Rautenstrauch, eds., SpringerVerlag, Heidelberg, Germany, 2001).This period is simply Tdouble = (ln 2)/ the time until the amount of consumed processing capacity is twice that now being consumed. In some cases, this I chose an exponential-growth model because it is the
Guerrilla Tools
45
CAPACITY
PLANNING
simplest function that captures the notion of compounded growth. It also reects superlinear revenue growth models. If you decide to use statistical models, you might want to consider using more robust tools like Minitab or R, mentioned in the Guerrilla Tools sidebar. The next task is to translate these trends into procurement requirements. Because trend lines pertain only to measurements from the current system conguration,you need a way to extrapolate to other possible congurations. For this purpose, I used the scalability function discussed in Example 1.
sinks. Its also a way of replacing risk perceptions with risk management. Sometimes, the biggest hurdle preventing the introduction of guerrilla capacity planning is simply getting started. Another way of dening guerrilla capacity planning is to see it from the viewpoint of those of us who do it: Management plans; we review. Management decides; we measure. Management dithers; we propose. Management revises; we plan.
s I hope these examples demonstrate, guerrilla capacity planning provides an approach to assessing application scalability that matches managements requirement to keep a tight rein on project schedules. In many situations where managers tend to avoid traditional capacity planning, the guerrilla approach can provide a simple framework to bring disparate groups together and reveal unanticipated performance issues. Once revealed, these issues are addressable within the context of existing schedules. In this way, guerrilla capacity planning helps keep projects on schedule and minimizes revisions. Think of it as a way of managing hidden time
Aspects of capacity planning that I have not discussed here include oor space, power, cooling, disk storage, tape storage, and so on. You can address many of these issues with spreadsheet models similar to those presented here. I encourage you to consider adopting guerrilla capacity planning in your organization. I
Neil J. Gunther is chief scientist at Performance Dynamics Consulting in Castro Valley, Calif.; http://www.perfdynamics. com. Contact him at [email protected].
Jan.-Feb. 2003
46 IT Pro July August 2002
http://computer.org/itpro/