Web Performance Testing
Web Performance Testing
Abstract
The fast growing number of Web sites for mission critical applications (e-commerce, e-business, contents
management and advertisement) makes web site performance and availability a business need, and a distinctive
feature for success in the market. Poor performance or low availability may have strong negative consequences
both on the ability of the company in attracting and retaining its own customers and on its ability to obtain high
revenues from the investment in new technology. Controlling performances of web site and back-end systems
(where e-business transactions run) is a key factor for every company.
During recent years, TILAB has progressively gained considerable methodological & practical
experience and know-how in Performance Testing and Measurement of Computer systems including the newest
Web Performance Testing and Measurement Techniques.
Web Performance Testing is executed through testing campaigns for stressing the web site and back-end
systems with the amount of load simulating the real conditions of the field or to evaluate if the site/application
will support the expected load following special situations (e.g. an advertisement campaign). That allows you to
guarantee system performances under that load and to identify and help in fixing possible issues.
Web Performance Measurement aims at analysis and fast characterization of system and user behaviour
in order to give fast feedback on any issues. In order to achieve that, TILAB has realized two tools: WEBSAT
(WEB Server Application Time) and BMPOP. WEBSAT is based on web server log files and completes the tool
suite of the commercial off the shelf products used for Web Performance Evaluation/Measurement activities.
BMPOP (BenchMark Point Of Presence) is used for Web Performance Measurement from the end-to-end
perspective.
This paper shows the present status of TILab approach to Web Performance Testing and Measurement.
The paper also presents two Case Studies applied by TILab to Web Performance Evaluation of Telecom Italia
SDH Network Management System and to the Web Site of an important Italian telecommunication service
operator. Finally some Conclusions are given in order to clarify the methodological and experimental trends
foreseen by TILab in the fast growing field of Web Performance Testing and Measurement.
1/18
Web Performance Testing and Measurement: a complete approach
1. Introduction
Thanks to the fast evolution of internet-based technologies during recent years, many Companies are
migrating their mission-critical applications towards the WEB, and they choose the WEB as an important source
of earnings. But to be successful, they need effective tools, resources and methodologies to provide the user with
a high service level.
Passing from LAN Client-Server (C/S) to Internet Web-based applications, your audience increases but
your risks increase as well. So, now, application performances need greater attention. In the USA a survey has
found that a user waits just 8 seconds for a page to download completely before leaving the site. In Italy, this
limit may be higher, but anyway, providing high performance is a key factor in the success of the site.
A web application is different from an “old” C/S application: lack of knowledge of clients, lack of
knowledge of users, more risks, a more complex architecture (firewalls, proxies, DNS, etc).
To measure web application performance effectively and define what improvement it needs, it’s quite
important to take into account its architecture (see Figure 1). This includes:
• Web Browser: the software client on which web applications run. It’s independent of the application.
• Internet Service Providers (ISP): provide different speed and type of Internet access.
• The “Big Internet” is the world wide communication infrastructure between browser and server web.
• A Firewall is the interface between company intranet and Internet. They can filter the ingoing and outgoing
traffic according to some rules defined by the web administrator.
• Other network elements can be found on the boundary between intranet and Internet: proxies and reverse
proxies. Proxies help an intranet client finding an external site and besides they have the important function
of caching, keeping in memory the most requested pages from the users of LAN. Reverse-proxies mask
intranet internal servers providing to Internet some sort of virtual address that is forwarded to the right
server.
• Web Servers is the applications able to meet requests from clients (browsers). It forwards the requests to
the web-application that can be on the same machine or on another server devoted for that purpose
(application server).
• Application server is the machine where the code of the applications runs. This machine can coincide with
the web server, but if a company has many applications is better to allocate them on different servers, letting
the web server the function of interface.
• The Data-base holds the data of the applications. Access data can heavy. When they manage many data,
access time could be too high. For that reason it’s better to allocate a machine just for this function (DB
server).
Even with this complex architecture, it’s important to assure users with a good service level. This level, usually,
is the response time that they experience in their browser loading a page or completing a transaction.
A survey over 117 companies with an income of, at least, $200 million highlighted that 94% of the
companies who scaled correctly had done web performance testing, while just 32% of the companies that didn’t
scale correctly had done that activity. So web performance testing activity is important, but is also important
how earlier it has been done during the development of the application.
This paper shows the present status of Telecom Italia Lab (TILab) approach to Web Performance Testing
and Measurement. The paper presents also two Case Studies applied by TILab to the Web Performance
Evaluation of Telecom Italia SDH Network Management System and to the Web Site of an important Italian
telecommunication service operator.
Finally some conclusion are given in order to clarify the methodological and experimental trends
foreseen by TILab in the fast growing field of Web Performance Testing and Measurement.
2.1. Objectives
There are different objectives achievable with a web performance testing and measurement. End-users
take advantage of a better application, system managers use that information to improve performances of their
systems and, last but not least, management can obtain useful information about the business of their company.
“End-User objectives”:
To find average response time of pages and transactions, slowest and fastest pages;
To make sure main pages (e.g. home page) can be downloaded within acceptable time (e.g. 10 seconds);
To find out maximum number of concurrent users, sessions and transactions that the application is able to
support still providing a high level of service;
To find out maximum number of concurrent users, sessions and transactions that the application is able to
support without system crash;
To characterize more frequent user paths, the most used starting and exiting page;
To identify main reasons of site abandonment.
“System” objectives:
To correlate system resource utilization with load;
To find out possible actual hardware bottlenecks and prevent new ones (capacity planning);
To tune all the web application components to support as much load as possible using actual hardware;
To find out how the application works when overloaded;
“Management”objectives:
To provide an objective measure of the usage of the site (e.g. for an e-commerce site, it could be the number
of electronic carts and the number of objects sold);
To provide a “business view” of the previous data (e.g. how performance issues have affected the business);
2.2. Components
To achieve the objectives just described, a complete approach is needed. This approach should be based
on sound methodology, a set of tools and a high level of know-how. The Telecom Italia Lab approach is based
on the following components:
• Web Performance Objectives – to identify, according to the previous classification (end-user, system,
management), what kind of results you need. Actually it’s possible to address all three categories or just one
or two;
• Web Performance Testing – to verify, in test plant, that the application is able to support the entire load
expected and even more (e.g. the load could increase in an unpredictable way after an advertising
campaign). For a more detailed analysis, this activity could be done both inside and outside proxy and
firewall;
• Web Performance Measurement – to measure the actual performance of the application in the field. This
could be done by identifying user behaviour, back-end system response time, maximum number of
concurrent users (Web Log Analysis), monitoring, at the same time, hardware and software resources
utilization (Resource Monitoring) and the end user experience (End-to-end Monitoring);
• Problem Resolution and Activity Results – to support client by providing the source required to fix any
bottlenecks;
• Capacity Planning – to ensure that adequate hardware and software resources will be available when
needed in the future. Capacity planning activity is carried out using all information from other components
All the components just described can be combined according to the needs of the client. Figure 2 shows a
possible sequence utilizing the components.
A test plant is expensive, but you can gain lot of advantages. Besides, all information extracted from the
field can be used afterwards to calibrate activities in test plant (only by monitoring the field is possible to extract
correct information about user profile).
To make sure that web performance testing and measurement activity provides useful results for an
effective evaluation of performances and bottlenecks, in order to predict behaviour of the application changing
the load, it’s important to have the following data:
Response Time: time for downloading pages and performing main transactions on the user side (e.g. to
download a page) and of the back-end systems side (e.g. to access a DB);
Application Data: a measure of the service provided (e.g. for an e-commerce site it could be the number of
the electronic carts and the number of objects sold);
User profile: all information useful to characterize the end-user behaviour (most and least requested pages,
paths followed during navigation, bandwidth, session average time,...);
Resources Data: all the monitoring data of resources of the servers on which the application runs (e.g. Cpu
utilization, Disk utilization, run queue, etc);
System Management Activity: a summary of all the system management activity carried out over all the
components of the architecture of the application during the testing period (system restart, tuning of buffers,
of cache, of the hardware and of other system parameters);
Expected load: expected growth of the load according to the valuation of the marketing department;
According to the type of load and timing, it’s possible to identify different types of tests:
Smoke Test – A brief test, just to check if the application is really ready to be tested (e.g. if it takes 5
minutes to download the home page it isn’t worth going on to test other pages)
Load Test – For these kinds of tests, the application is subject to a variable increasing load (until the peak
load is reached). It’s useful to understand how the application (software + hardware) will react in the field.
Stress Test - For these kinds of tests, the application is subject to a load bigger than the one actually
expected. It’s useful to evaluate the consequences of an unexpected huge load (e.g. after an advertisement
campaign)
Spike Testing – For these kinds of tests, the application is subject to burst loads. It’s useful to evaluate
applications used by lot of users at the same time (high concurrent user rate)
Stability Testing – For these kinds of tests, the application is subject to an average load for a long period of
time, It’s useful to find out problems like memory leaks.
It’s important to create workloads with high accuracy and as real as possible. That means knowing
correctly the real user profile. The newest Web Log Analysis techniques and tools (see next paragraph) can help
to solve this problem.
user paths, ingoing and outgoing pages, browsers used, keywords used searching for the site, Operating systems,
etc.
Using all that information it is possible to create workloads that can really simulate user activities.
TILab noticed that most commercial WLA tools don’t use all the information of a web log file. Among
them there’s a value that represents back-end system response time. That is the time a server web takes to
execute a request coming from the browser. This is not the response time experienced by the user, but it’s a
useful metric of performances of the application (e.g. if a cgi script takes 2 minutes to be executed by the back-
end systems it isn’t surely working well).
To fill that gap, TILab has created WEBSAT (Web Server Application Time). WEBSAT is a software
that uses web server log files to produce a table containing the user, and the IP addresses, of the users and,
mainly, the execution time of every object requested by the browser.
same transaction and activity of users, so they act as a probe on the Internet. This approach is completely “non-
intrusive”, but needs a distribution of clients that should reflect the user distribution.
After the Web Performance testing and measurement activities, enough data will be available to attempt
to predict two important things: how much spare capacity do you have on your system and how long before you
have to upgrade the existing infrastructure. Also you should be able to predict when the future load levels are
expected to saturate the site and what are the most effective hardware and software upgrades that will allow your
system to expand and sustain the load over a specified period of time.
By means of simple models and knowledge of the system workload and architecture, direction can be
given to the Web management team to improve or upgrade one or more components of the system. Furthermore,
actions can be defined to tune the system resources in order to maximize performance in the near future, by
means of adjusting system parameters. So Capacity Planning is a natural follow-up action of the Performance
Testing and Measurement activities.
One of the more relevant issues related to this configuration is the accessibility to the management data,
through the WEB applications, from users belonging to other companies who connect to the system in order to
get management information on private circuits.
The need to grant good performance and availability on client systems is, therefore, a crucial factor in
success not only inside Telecom Italia itself, where the main target pursued is to support in a proper way
operational processes as well as to provide management services in an efficient and cost effective way, but
particularly outside the company, where poor performances on the WEB applications could impact on the
relationship between customer and service provider, affecting credibility and image.
Performance evaluation of new software releases before their introduction in the field
(comparative characterization between different software versions) in Test Plant environment.
Performance measurement and analysis in the field.
This approach reflects the constraints of a specific domain characterized by a continuous introduction of
new functionality or components in an assessed environment showing typically, in the field conditions, adequate
level of performances.
The strategic relevance of the system, as well as the need to add new features or upgrades evaluating
interoperability issues and fixing bugs before the deployment phase, has led to the reproduction in a testing lab
environment the field architecture, of a strategy not always carried out for cost reasons, but in this specific case
necessary in order to support adequately the operational processes implemented by the company and to assure
the business goals related to the circuits delivery throughput.
A key factor of the approach adopted is therefore the capacity to anticipate performance criticisms in a
testing environment, preventing the negative impacts on the business of the company before the new features
roll out phase, as well as to monitor the behaviour of the system in the field taking into account either workload
conditions or physical resources occupations and performance trends.
The main problem encountered, given a system configuration similar to the real one, has been how to
generate, in testing lab environment, realistic workload conditions and to stress the system in order to identify
main problems of performance.
4.4. Two different work load sources, two different load generators
In the specific contest evaluated, two main workload sources are present: the load coming from the
network, such as incoming notifications from the Element Manager Layer, and the load due to the operations
requested on the client’s side, that means workload related to operations on both Work Stations and WEB clients
causing several transactions between the system components.
One of the main differences between a generic WEB site and a WEB based application implementing
management services is, in fact, that the load due to user navigation is just one component to deal with,
moreover, requests done by users typically trigger complex transactions, because the system is not only a data
repository, but actually carries out several management functions.
The need to cope with two different workload sources has been managed using two different load
generators, the first one with the purpose of emulating the load created by the network, the second one with the
objective to emulate the user’s behaviour.
The usage of a commercial tool for user’s load emulation: the impact of a realistic load
testing
User workload, within the specific context, is carried out by a commercial toolkit providing users
emulation functions based on capturing real user transactions, recording the communication on editable scripts
and replaying the captured behaviour for several users.
The main goal of this testing component is to detect response time and failure rate under various loading
conditions, analysing HW resource consume, and determining system breakpoints.
In order to implement realistic load conditions, a crucial factor is to take into account a large variety of
user requests, that means spending time and energy defining significant test scripts and properly parameterizing
them. This technique is, in fact, the more effective the longer the range of data used during the playback phase
the closest to the habits of real users.
The development of scripts in the tool environment takes, actually, familiarity both with the application
under test and the tool itself, particularly in order to deal with capture accuracy as well as parameterisation issue.
Two important issues to take in to account are: to avoid caching mechanism during load playback and accurately
design the user behaviour, introducing, if required, real conditions, such as “think time”.
A correct understanding of user behaviour in the field is a significant issue in order either to define
adequately test sessions or to properly interpret test sessions results, while it gives important inputs in the test
designing phase as well as increasing awareness of how stressful the test session has been in comparison with
actual usage of the system in the target environment.
A significant advantage presented by this approach is that the scripts implemented can be re-run multiple
times, insuring a repeatable process. When software upgrades don’t affect transactions between client and server
(regarding WEB applications, that means upgrades that don’t impact on HTTP transactions between the browser
client and the web server), the same scripts can be used several times and the data collected by the tool,
regarding response time and failure percentage, could significantly help in order to perform comparisons
between different software versions or characterizing possible variations of system’s performance in the field, as
long as the amount of managed data is increasing.
In the specific application contest, the modularity of the System Under Test (SUT) allows the
identification of a repository of scripts building suites oriented to the evaluation of each application element
(such as the main client server application, the “Clients Terminal Subsystem”, the “Performance Monitoring
Subsystem” and so on). The tool chosen supports concurrent replication of different test suites granting a
realistic emulation of users workload on an application presenting more user access points.
The approach used is based on a high level of reuse of the same tools and methodology adopted in the lab
environment. This solution gives the opportunity either to reduce instrumentation costs or to compare results in
an easy way, saving time as well as upgrading resources expertise.
collected data is, anyway, the correlation between resources consumes and workload conditions. In the specific
context, this is performed either evaluating the operations and the amount of data managed by the system during
the observation period, extracting metrics from the DB (SQL scripts), or characterizing, for the WEB based
elements, users behaviour trough WLA.
While load testing in the field is run as a source of comparative data, it must be executed without any
other workloads. Not all applications or sites could be tested in such conditions because users’ access is in most
cases unpredictable, but the application context allows this approach being the SUT a management system
accessed typically by end users at rather predictable times.
When the lab HW configuration is slightly different from the target one, running load testing suites either
in the lab or in the field, can significantly help in order to properly scale lab results with prediction purposes.
This experience has been made for the management system evaluated as long as the target configuration has
been progressively upgraded introducing more powerful servers.
Load testing execution and results analysis in the lab can’t be split by workloads characterization,
understanding of users behaviour and system monitoring in the field.
Conversely, the evaluation of system performance in the field can’t be separated from the experience
made in the lab concerning test suites assessment, workloads analysis, correlation between HW resources
consume and operational results.
The experience shared demonstrates that the method for approaching the task of performance evaluation
and improvement has necessarily to take into account specificity and constraints present in the application
context; the knowledge of the application context is in turn a basic starting point of a global process made up of
operational choices and a combination of different techniques, that, as a whole, can contribute to achieving the
goal of improved performance.
An important Italian telecommunication service operator had received some complaining by users not
completely satisfied by performances (i.e. response time) experienced using a web-based chat service. To cope
with this problem, TILab has executed a measurement campaign with BMPOP (TILab End-to-end Monitoring
tool – see par. 3.4), to measure the real downloading response time of the home page of the service, from
different ISPs.
Step 1 lets you know about the availability of POPs. Step 2 gives a measure of response times
experienced by users of different cities and ISPs, loading the home page of the application under test. Next
paragraphs describe and analyse data obtained with tool BMPOP.
A user, using Provider A to connect to Internet, doesn’t have problems downloading the home page.
Different results are obtained using Provider B. In that case 4.3% of total tests fail.
Figure 11 shows home page downloading time using provider A as a connection to Internet. After 6 April
02:00 response times remain stable with an average of 29 seconds (during the previous period it was 40
seconds). It seems that response times have no relation with the time of the day.
Figure 12 shows home page downloading time using provider B as a connection to Internet. In that case
response times have a strong correlation with working hours (09:30-13:00 and 14:00-18:00). In that interval, the
average response time is 80 seconds, outside is just 31.
Figure 13 shows network access error distribution per provider. It’s easy to note that provider D is the
worst.
Analysing Web Server Log file seems that response times have no correlation with the number of users
connected.
• Performances of a web server are affected by caching activity. At the beginning of every test, all
the data in cache were cancelled.
• In this context it's possible considering 30 seconds an acceptable downloading time. Anyway it is
far from being good (less than 10 seconds).
• The home page has a dimension of 83388 Byte. Using a good connection to Internet, the possible
minimum response time depends on the modem speed, but it’s no faster than the time showed on
the following table
To improve general performances of the chat service, two possible solutions are
1. Reducing the dimension of the page (now about 80 KB) and the number of images (81);
2. Review of caching activity parameters of the server-proxy.
To provide more detailed analysis, a load and stress test activity will be executed to correlate response time with
amount of load (number of concurrent users)
6. Final Conclusions
Poor performance or low availability may have strong negative consequences both on the ability of the
company in attracting and retaining its own customers and on its ability to obtain high revenues from the
investment in new technology. Controlling performances of web site and back-end systems (where e-business
transactions run) is a key factor for every company.
We think that the capacity of an e-business application is difficult to estimate or simulate because usually
systems are too complex to model a priori. To deliver proper performances you need a load testing activity (on-
site o remotely) simulating a real scenario (Web Log Analysis), combined with hardware resource utilization
monitoring and end-to-end monitoring of web site (using different ISPs).
TILab is deeply involved in the development of methods and tools for Web Performance Testing and
Measurement. Our reports and analysis, about the performance of a web-based application, allows us to compare
network performances, application performances and service levels as perceived by end users. We are evolving
this approach including information about load test results (including simple models on hardware resource
utilization) to anticipate performance problems and to highlight differences from actual and estimated behaviour.
7. References
[1] R. Jain - “The Art of Computer System Performance Analysis”, John Wiley & Sons, 1991
[2] P. J. Ross Taguchi – “Techniques for Quality Engineering”, McGraw-Hill, 1996
[3] M. Richeldi, G. Ferro, D. Gotta - “Predicting the performance under overload conditions of operations
systems of TLC networks”, CMG 1997
[4] L. Biffi, D.Gotta - “Performance Evaluation of web applications”, CMG 1998
[5] Zona Assessment Paper Issue 33 Internet Performance Services
[6] F. Broccolini - "The Telecom Italia SDH Management System", London, 1996
[7] N. Bersia, G. Lo Russo, R. Manione, M. Porta - "TMN OS validation and testing via thorough Agent
Emulation", NOMS-96, Kyoto, 1996
[8] A. C. Siller, Jr. and M. Shafi, Eds. SONET/SDH C. – “A Sourcebook of Synchronous Networking”,
IEEE Press, 1996
Acknowledgment
The authors wish to thank M.P. Maccario, N. Reale e M. Coppola for their valuable contribution about
BMPOP application (TILAB End-to-End Monitoring Tool ).