Ebook Top 5 PHP Performance Metrics Tips and Tricks
Ebook Top 5 PHP Performance Metrics Tips and Tricks
Table of Contents
Chapter 1: Getting started with APM......................................................................................................... 3
Chapter 2: Challenges in implementing an APM strategy............................................................................ 7
Chapter 3: Top five performance metrics to capture in enterprise PHP applications.................................. 11
Chapter 4: AppDynamics approach to APM.............................................................................................. 15
Chapter 5: PHP APM tips and tricks......................................................................................................... 19
2
Chapter 1: Getting started with APM
Chapter 1: Getting started with APM
Application Performance Management, or APM, is the monitoring Finally, depending on your application and deployment environment, there may be
things that you can tell the APM solution to do to automatically remediate the problem.
and management of the availability and performance of software For example, if your application is running in a cloud-based environment and your
applications. Different people can interpret this definition differently application has been architected in an elastic manner, you can configure rules to add
so this article attempts to qualify what APM is, what it includes, additional servers to your infrastructure under certain conditions.
and why it is important to your business. If you are going to take Thus we can refine our definition of APM to include the following activities:
control of the performance of your applications, then it is important
– The collection of performance metrics across an entire application environment
that you understand what you want to measure and how you want
– The interpretation of those metrics in the context of your business applications
to interpret it in the context of your business.
– The analysis of those metrics against what constitutes normalcy
What is Application Performance Management (APM)? – The capture of relevant contextual information when abnormalities are detected
As applications have evolved from stand-alone applications to client-server applications – Alerts informing you about abnormal behavior
to distributed applications and ultimately to cloud-based elastic applications, application – Rules that define how to react and adapt your application environment to remediate
performance management has evolved to follow suit. When we refer to APM we refer performance problems
to managing the performance of applications such that we can determine when they
are behaving normally and when they are behaving abnormally. Furthermore, when
Why is APM important?
something goes wrong and an application is behaving abnormally, we need to identify
the root cause of the problem quickly so that we can remedy it. It probably seems obvious to you that APM is important, but you will likely need to
answer the question of APM importance to someone like your boss or the company CFO
We might observe things like: that wants to know why she must pay for it. In order to qualify the importance of APM,
let’s consider the alternatives to adopting an APM solution and assess the impact in
– The stack trace execution of PHP code terms of resolution effort and elapsed down time.
– The PHP runtime that is interpreting the application code
First let’s consider how we detect problems. An APM solution alerts you to the abnormal
– The behavior of the application itself application behavior, but if you don’t have an APM solution then you have a few options:
– The physical hardware upon which the application is running
– Build synthetic transactions
– The virtual machines or containers in which the application is running
– Manual instrumentation
– The correlation of client-side (mobile and browser) code to the server-side PHP code
– Wait for your users to call customer support!?
– Supporting infrastructure, such as databases, caches, queue servers, external web
services, and legacy systems A synthetic transaction is a transaction that you execute against your application and
with which you measure performance. Depending on the complexity of your application,
Once we have captured performance metrics from all of these sources, we need to it is not difficult to build a small program that calls a service and validates the
interpret and correlate them with respect to the impact on your business transactions. response. But what do you do with that program? If it runs on your machine then what
This is where the magic of APM really kicks in. APM vendors employ experts in different happens when you’re out of the office? Furthermore, if you do detect a functional or
technologies so that they can understand, at a deep level, what performance metrics performance issue, what do you do with that information? Do you connect to an email
mean in each individual system and then aggregate those metrics into a holistic view of server and send alerts? How do you know if this is a real problem or a normal slowdown
your application. for your application at this hour and day of the week? Finally, detecting the symptom is
The next step is to analyze this holistic view of your application performance against half the battle how do you find the root cause of the problem?
what constitutes normalcy. For example, if key business transactions typically respond in The next option is manually instrumenting your application: adding performance
less than 4 seconds on Friday mornings at 9am but they are responding in 8 seconds on monitoring code directly into your application and recording metrics to a database or
this particular Friday morning at 9am then the question is why? An APM solution needs logs on a file system. Challenges with manual instrumentation include: what parts of
to identify the paths through your application for those business transactions, including the code to instrument and how do I analyze it, how to determine normalcy, how to
external dependencies and environmental infrastructure, to determine where they are propagate those problems up to someone to analyze, what contextual information
deviating from normal. It then needs to bundle all of that information together into a is important, and so forth. Furthermore, you have introduced a new problem: the
digestible format and alert you to the problem. You can then view that information, perpetual maintenance of performance monitoring code into your application that
identify the root cause of the performance anomaly, and respond accordingly. you are now responsible for. Can you dynamically turn it on and off so that your
4
Chapter 1: Getting started with APM
performance monitoring code does not negatively affect the performance of your writes, network latency, and so forth. Not to mention, we raised alerts whenever a
application? If you learn more about your application and identify additional metrics you server went down. Advanced implementations even introduced the ability to trace a
want to capture, do you need to rebuild your application and redeploy it to production? request from the web server that received it across tiers to any backend system, such
What if your performance monitoring code has bugs? as a database. These were powerful solutions, but then something happened to change
our world: the cloud.
There are additional technical options, but what I find is that companies are most often
alerted to performance problems when their custom service organization receives The cloud changed our view of the world because no longer did we take a system-level
complaints from users. I do not think I need to go into details about why this is a bad idea! view of the behavior of our applications, but rather we took an application-centric view
of the behavior of our applications. The infrastructure upon which an application runs
Next, let us consider how to identify the root cause of a performance problem without became abstracted, and what became more important is whether or not an application
an APM solution. Most often I have seen companies do one of two things: is able to perform optimally. If an individual server goes down, we do not need to
– Review runtime logs worry as long as the application business transactions are still satisfied. As a matter
of fact, cloud-based applications are elastic, which means that we should expect the
– Attempt to reproduce the problem in a development / test environment deployment environment to expand and contract on a regular basis. For example, if you
Log files are great sources of information and many times they can identify functional know that your business experiences significant load on Fridays from 5pm-10pm then
defects in your application (by capturing exception stack traces). However, when you might want to start up additional virtual servers to support that additional load at
experiencing performance issues that do not raise exceptions, logs typically only 4pm and shut them down at 11pm. The former APM monitoring model of raising alerts
introduce additional confusion. You may have heard of, or been directly involved in, when servers go down would drive you nuts.
a production war room. These war rooms are characterized by finger pointing and Furthermore, by expanding and contracting your environment, you may find that single
attempts to indemnify one’s own components so that the pressure to resolve the issue server instances only live for a matter of a few hours. You may still find some APM
falls on someone else. The bottom line is that these meetings are not fun nor productive. solutions from the old world, but the modern APM vendors have seen these changes in
Alternatively, and usually in parallel, the development team is tasked with reproducing the industry and have designed APM solutions to focus on your application behavior and
the problem in a test environment. The challenge here is that you usually do not have placed a far greater importance on the performance and availability of business
have enough context for these attempts to be fruitful. Furthermore, if you are able to transactions than on the underlying systems that support them.
reproduce the problem in a test environment, that is only the first step, now you need
to identify the root cause of the problem and resolve it! Buy versus build
This article has covered a lot of ground and now you are faced with a choice: do you
In summary, APM is important to you so that you can understand the behavior of your
evaluate APM solutions and choose the one that best fits your needs or do you try to
application, detect problems before your users are impacted, and rapidly resolve those
roll your own. I really think this comes down to the same questions that you need to ask
issues. In business terms, an APM solution is important because it reduces your Mean
yourself in any buy versus build decision: what is your core business and is it financially
Time To Resolution (MTTR), which means that performance issues are resolved swifter
worth building your own solution?
and efficiently so that the impact to your business bottom line is reduced.
If your core business is selling widgets then it probably does not make sense to build
Evolution of APM your own performance management system. If, on the other hand, your core business
The APM market has matured substantially over the years, mostly in response to is building technology infrastructure and middleware for your clients then it might make
changing application technologies and deployments. When we had very simple sense (but see the answer to question two below). You also have to ask yourself where
applications that directly accessed a database. APM was not much more than a your expertise lies. If you are a rock star at building an eCommerce site but have not
performance analyzer for a database. However, as applications migrated to the web and invested the years that APM vendors have in analyzing the underlying technologies to
we witnessed the first wave of application servers then APM solutions really came into understand how to interpret performance metrics then you run the risk of leaving your
their own. At the time, we were very concerned with the performance and behavior of domain of expertise and missing something vital.
individual moving parts, such as: The next question is: is it economical to build your own solution? This depends on
– Physical servers and the operating system hosting our applications how complex your applications are and how downtime or performance problems
affect your business. If your applications leverage a lot of different technologies (e.g.
– PHP runtime behavior PHP, Java, .NET, Python, web services, databases, NoSQL data stores) then it is going
– Application response time to be a large undertaking to develop performance management code for all of these
environments. But if you have a simple PHP CMS that calls a database then it might not
We captured metrics from all of these sources and stitched them together into a holistic be insurmountable.
story. We were deeply interested in APC cache behavior, operating system reads and
5
Chapter 1: Getting started with APM
Finally, ask yourself about the impact of downtime or performance issues on your
business. If your company makes its livelihood by selling its products online then
downtime can be disastrous. And in a modern competitive online sales world,
performance issues can impact you more than you might expect. Consider how the
average person completes a purchase: she typically researches the item online to choose
the one she wants. She’ll have a set of trusted vendors (and hopefully you’re in that
honored set) and she’ll choose the one with the lowest price. If the site is slow then
she’ll just move on to the next vendor in her list, which means you just lost the sale.
Additionally, customers place a lot of value on their impression of your web presence.
This is a hard metric to quantify, but if your web site is slow then it may damage
customer impressions of your company and hence lead to a loss in confidence and sales.
All of this is to say that if you have a complex environment and performance issues or
downtime are costly to your business then you are far better off buying an APM solution
that allows you to focus on your core business and not on building the infrastructure to
support your core business.
Conclusion
Application Performance Management involves measuring the performance of your
applications, capturing performance metrics from the individual systems that support
your applications, and then correlating them into a holistic view. The APM solution
observes your application to determine normalcy and, when it detects abnormal
behavior, it captures contextual information about the abnormal behavior and notifies
you of the problem. Advanced implementations even allow you to react to abnormal
behavior by changing your deployment, such as by adding new virtual servers to your
application tier that is under stress. An APM solution is important to your business
because it can help you reduce your mean time to resolution (MTTR) and lessen the
impact of performance issues on your bottom line. If you have a complex application
and performance or downtime issues can negatively affect your business then it is
in your best interested to evaluate APM solutions and choose the best one for your
applications.
This article reviewed APM and helped outline when you should adopt an APM solution.
In the next article, we’ll review the challenges in implementing an APM strategy and dive
much deeper into the features of APM solutions so that you can better understand what
it means to capture, analyze, and react to performance problems as they arise.
6
Chapter 2: Challenges in implementing an
APM strategy
Chapter 2: Challenges in implementing an APM strategy
The last article presented an overview of Application Performance Therefore, an APM strategy that effectively captures business transactions not only
needs to measure the performance of the business transaction as a whole, but also
Management (APM), described high-level strategies and needs to measure the performances of its constituent parts. Practically, this means that
requirements for implementing APM, presented an overview of the you need to define a global business transaction identifier (token) for each request, find
evolution of APM over the past several years, and provided you creative ways to pass that token to other services, and then access that token on the
with some advice about whether you should buy an APM solution those servers to associate this segment of the business transaction with the holistic
business transaction on an APM server. Fortunately, most communication protocols
or build your own. This article expands upon that foundation by support mechanisms for passing tokens between machines, such as using custom HTTP
presenting the challenges to effectively implementing an APM headers in web requests. The point is that this presents a challenge because you need
strategy. Specifically this article presents the challenges in: to account for all of these communication pathways in your APM strategy.
Once you have captured the performance of a business transaction and its constituent
– Capturing performance data from disparate systems tiers, the fun begins. The next section describes analysis in more depth, but assuming
– Analyzing that performance data that you have identified a performance issue, the next step is to capture a snapshot
of the performance trace of the entire business transaction, along with any other
– Automatically, or programmatically, responding to performance relevant contextual information. There are different strategies for capturing performance
problems snapshots, but the most common is PHP-code instrumentation by hooking into the
source-code execution stack.
Capturing performance data PHP source code is interpreted, unlike other languages that may possibly be compiled.
Modern distributed application environments leverage a wide variety of technologies. For First, the PHP engine converts your application source code into what is known as
example, you may have a PHP application cluster, a SQL database, one or more NoSQL opcodes - this is similar to Java bytecode. Then, the Zend engine executes the PHP
databases, a caching solution, web services running on alternate platforms, and so opcodes that was generated and stored in memory. The big caveat to be aware of is
forth. Furthermore, we’re finding that certain technologies are better at solving certain that you need to capture performance information without negatively impacting the
problems than others, which means that we’re adding more technologies into the mix. performance of the PHP code itself. Stated another way, don’t make the problem (too
much) worse!
In order to effectively manage the performance of your environment, you need to
gather performance statistics from each component with which your application Instrumenting PHP code provides a real-user view of the behavior of the business
interacts. We can categories these metrics into two raw buckets: transaction, but it can be a heavyweight solution that can slow down the overall
performance of the business transaction. An alternative is to profile individual requests
– Business Transaction Components
that the PHP application serves. By measuring each individual request, you will have the
– Infrastructure Components benefit of capturing vital metrics such as method execution times, the CPU and memory
impact, and generating a stack trace. However, the impact on the performance of your
Measuring business transaction performance application will significantly drop in production as this adds wunnecessary overhead to
The previous article emphasized the importance of measuring business transactions as the execution of your application code when the overall health would perform perfectly
an indicator of the performance of your application because business transactions reflect fine otherwise. The key is to intelligently determine when to profile the execution of
the actions necessary to fulfill a real-user request. If your users are able to complete the application code so that you reduce the overhead of your APM solution when
their business transactions in the expected amount of time then we can say that the performance is optimal yet providing the necessary diagnostic data when performance
performance of the application is acceptable. But if business transactions are unable to is below optimal.
complete or are performing poorly then there is a problem that needs to be addressed.
8
Chapter 2: Challenges in implementing an APM strategy
A PHP application runs on a server with a PHP interpreter that runs on an operating Figure 2: Assembling segments into a business transaction
system that powers the environment. If you’re running multiple PHP applications on a
single server, you may possibly be competing for resources on that server. Furthermore,
you may have additional components that your PHP application interfaces with in order Analyzing the performance of a business transaction might sound easy on the surface:
to serve a request that may impact the ART of that request. compare its response time to a service-level agreement (SLA) and if it is slower than the
SLA then raise an alert. Unfortunately, in practice it is not that easy. We want to instead
In order to effectively manage the performance of your application, you need to gather determine what constitutes “normal” and identify when behavior deviates from “normal”.
container metrics such as the following:
We need to capture the response times of individual business transactions, as a whole,
– Web Server: CPU load, requests per second, busy workers, memory usage as well as the response times of each of those business transactions’ tiers or segments.
– Cache: APC, Memcache and Memcached For example, we might find that the “Search” business transaction typically responds
in 3 seconds, with 2 seconds spent on the database tier and 1 second spent in a web
– Operating Systems: network usage, I/O rates, system threads service call. But this introduces the question of what constitutes “typical” behavior in
– Hardware: CPU utilization, system memory, network packets the context of your application?
These are just a few of the relevant metrics, but you need to gather information at Different businesses have different usage patterns so the normal performance of a
this level of granularity in order to assess the health of the environment in which your business transaction for an application at 8am on a Friday might not be normal for
application is running. And as we introduce additional technologies in the technical another application. In short, we need to identify a baseline of the performance of a
stack, such as databases, NoSQL databases, internal web-services, distributed caches, business transaction and analyze its performance against that baseline. Baselines can
key/value stores, and so forth, they each have their own set of metrics that need to be come in the following patterns:
captured and analyzed. Building readers that capture these metrics and then properly
interpreting them can be challenging. – The average response time for the business transaction, at the granularity of an
hour, over some period of time, such as the last 30 days.
– The average response time for the business transaction, at the granularity of
an hour, based on the hour of day. For example, we might compare the current
response time at 8:15am with the average response time for every day from 8am-
9am for the past 30 days.
9
Chapter 2: Challenges in implementing an APM strategy
– The average response time for the business transaction, at the granularity of an Business transaction baselines should include a count of the number of business
hour, based on the hour of the day and the day of the week. In this pattern we transactions executed for that hour. If we detect that load is significantly higher than
compare the response time of a business transaction on Monday at 8:15am with the “normal” then we can define rules for how to change the environment to support the
average response time of the business transaction from 8am-9am on Mondays for load. Furthermore, regardless of business transaction load, if we detect container-
the past two months. This pattern works well for applications with hourly variability, based performance issues across a tier, adding servers to that tier might be able
such as ecommerce sites that see increased load on the weekends and at certain mitigate the issue.
hours of the day.
Smart rules that alter the topology of an application at runtime can save you money
– The average response time for the business transaction, at the granularity of with your cloud-based hosting provider and can automatically mitigate performance
an hour, based on the hour of day and the day of the month. In this pattern we issues before they affect your users.
compare the response time of a business transaction on the 15th of the month
at 8:15am with the average response time of the business transaction from 8am-
9am on the 15th of the month for the past 6 months. This pattern works well for Conclusion
applications with date based variability, such as banking applications in which users This article reviewed some of the challenges in implementing an APM strategy. A proper
deposit their checks on the 15th and 30th of each month. APM strategy requires that you capture the response time of business transactions and
their constituent tiers, using techniques like code profiling and backend detection, and
In addition to analyzing the performance of business transactions, we also need that you capture container metrics across your entire application ecosystem. Next, you
to analyze the performance of the backends and infrastructure in which the PHP need to correlate business transaction segments in a management server, identify the
application runs. There are abhorrent conditions that can negatively impact all business baseline that best meets your business needs, and compare current response times to
transactions running in an individual environment. For example, if your web server runs your baselines. Finally, you need to determine whether you can automatically change
out of connection pools then requests will back up, if the OS runs a backup process with your environment to mitigate the problem or raise an alert so that someone can analyze
heavy I/O then the machine will slow down, and so forth. It is important to correlate and resolve the problem.
business transaction behavior with environment behavior to identify false-positives: the
application may be fine, but the environment in which it is running is under duress. In the next article we’ll look at the top-5 performance metrics to measure in an
enterprise PHP application and how to interpret them.
Finally, ecosystem metrics can be key indicators that trigger automated responses that
dynamically change the environment, which we explore in the next section.
One of the key benefits that elastic applications enable is the ability to automatically
scale. When we detect a performance problem, we can respond in two ways:
– Raise an alert so that a human can intervene and resolve the problem
– Change the deployment environment to mitigate the problem
There are certain problems that cannot be resolved by adding more servers. For those
cases we need to raise an alert so that someone can analyze performance metrics and
snapshots to identify the root cause of the problem. But there are other problems that
can be mitigated without human intervention. For example, if the CPU usage on the
majority of the servers in a tier is over 80% then we might be able to resolve the issue
by adding more servers to that tier.
10
Chapter 3: Top five performance metrics to
capture in enterprise PHP applications
Chapter 3: Top five performance metrics to capture in enterprise PHP applications
The last couple articles presented an introduction to Application Performance Average + 2 standard deviations
Management (APM) and identified the challenges in effectively implementing an APM
strategy. This article builds on these topics by reviewing five of the top performance
metrics to capture to assess the health of your enterprise PHP application.
Baseline
Specifically this article reviews the following:
– Business Transactions
– Internal Dependencies Business transaction response time
– External Calls
– Caching Strategy
Greater than 2 SDs: alert
– Application Topology
Figure 1: Evaluating BT response time against its baseline
Business transactions
Business Transactions provide insight into real-user behavior: they capture real-time
performance that real users are experiencing as they interact with your application. As The baseline used to evaluate the business transaction is evaluated is consistent for
mentioned in the previous article, measuring the performance of a business transaction the hour in which the business transaction is running, but the business transaction is
involves capturing the response time of a business transaction holistically as well as being refined by each business transaction execution. For example, if you have chosen
measuring the response times of its constituent tiers. These response times can then a baseline that compares business transactions against the average response time for
be compared with the baseline that best meets your business needs to determine the hour of day and the day of the week, after the current hour is over, all business
normalcy. transactions executed in that hour will be incorporated into the baseline for next week.
Through this mechanism an application can evolve over time without requiring the
If you were to measure only a single aspect of your application I would encourage you original baseline to be thrown away and rebuilt; you can consider it as a window moving
to measure the behavior of your business transactions. While container metrics can over time.
provide a wealth of information and can help you determine when to auto-scale your
environment, your business transactions determine the performance of your application. In summary, business transactions are the most reflective measurement of the user
Instead of asking for the CPU usage of your application server you should be asking experience so they are the most important metric to capture.
whether or not your users are able to complete their business transactions and if those
business transactions are behaving normally. Internal dependencies
As a little background, business transactions are identified by their entry-point, which Your PHP application may be utilizing a backend database, a caching layer, or possibly
is the interaction with your application that starts the business transaction. In the case even a queue server as it offloads I/O intensive blocking tasks onto worker servers to
of a PHP application, this is usually the HTTP request. There may be some exceptions, process in the background. Whatever the backend your PHP application interfaces
such as PHP CLI, in which case the business transaction could be a PHP script executed with, the latency to these backend services can affect the performance of your PHP
by a cron job from the command line. In the case of a PHP worker server, the business application performance. The various types of internal exit calls may include:
transaction could potentially be the job that the PHP application executes that it picked – SQL databases
up from a queue server. Alternatively, you may choose to define multiple entry-points
for the same web request based on a URL parameter or for a service call based on the – NoSQL servers
contents of its body. The point is that the business transaction needs to be related to a – In-memory cache
function that means something to your business. – Internal services
Once a business transaction is identified, its performance is measured across your – Queue servers
entire application ecosystem. The performance of each individual business transaction is In some environments, your PHP application may be interfacing with an obscure
evaluated against its baseline to assess normalcy. For example, we might determine that backend or messaging/queue server. For example, you may have an old message broker
if the response time of the business transaction is slower than two standard deviations serving as an interface between your PHP application and other applications. While this
from the average response time for this baseline that it is behaving abnormally, as message broker may be outdated, it is nevertheless part of an older architecture and is
shown in figure 1. part of the ecosystem in which your distributed applications communicate with. Even if
your APM solution does not autodiscover the messaging server, you may build additional
12
Chapter 3: Top five performance metrics to capture in enterprise PHP applications
instrumentation to detect the messaging server and measure the latency. If you want Caching strategy
to draw correlation so the business transaction does not become fragmented, you
It is always faster to serve an object from memory than it is to make a network call
may also build additional instrumentation so that the correlation is autocompleted
to retrieve the object from a system like a database; caches provide a mechanism for
when the request passes through the messaging server to the applications on the
storing object instances locally to avoid this network round trip. But caches can present
other side of the queue.
their own performance challenges if they are not properly configured. Common caching
From a business transaction perspective, we can identify and measure internal problems include:
dependencies as being in their own tiers. Sometimes we need to configure the
– Loading too much data into the cache
monitoring solution to identify methods that really wrap external service calls, but for
common protocols, such as HTTP and CLI, internal dependencies can be automatically – Not properly sizing the cache
detected. Similar to business transactions and their constituent application tiers,
When measuring the performance of a cache, you need to identify the number of objects
external dependency behavior should be baselined and response times evaluated
loaded into the cache and then track the percentage of those objects that are being
against those baselines.
used. The key metrics to look at are the cache hit ratio and the number of objects that
However your PHP application communicates with internal services, the latency in are being ejected from the cache. The cache hit count, or hit ratio, reports the number
waiting for the response can potentially impact the performance of your application of object requests that are served from cache rather than requiring a network trip to
and your customer experience. Measuring and optimizing the response time if your retrieve the object. If the cache is huge, the hit ratio is tiny (under 10% or 20%), and you
communications can help solve for these bottlenecks. are not seeing many objects ejected from the cache then this is an indicator that you are
loading too much data into the cache. In other words, your cache is large enough that it
is not thrashing (see below) and contains a lot of data that is not being used.
External calls
In addition to your internal services and backends, exit calls may also include remote The other aspect to consider when measuring cache performance is the cache size. Is
third-party web service APIs that your application makes calls to in real-time. For the cache too large, as in the previous example? Is the cache too small? Or is the cache
example, if your customer is attempting to purchase items in a shopping cart and in sized appropriately?
order for the transaction to complete your application must charge their credit card
A common problem when sizing a cache is not properly anticipating user behavior and
so that you can display a confirmation or error page. This is an example of a blocking
how the cache will be used. Let’s consider an APC cache configured to use 32M of
exit call because your entire transaction is now dependent on the response of that
memory. If 32M is not enough for APC to be used as an opcode cache, then you run the
call being made to your third-party merchant provider that is charging the credit card.
risk of exchanging data in memory for data on the disk. Considering disk I/O is much
If the customers credit card was charged successfully, the user is presented with a
slower than reading from memory, this defeats the purpose of your in-memory cache
confirmation page and a receipt. If the credit card was declined, the user is presented
and significantly slows down your application performance. The result is that we’re
with an error message to try again. Regardless, the customer is waiting on the
spending more time managing the cache rather than serving objects: in this scenario
application that is dependent on the third party merchant provider. The latency of the
the cache is actually getting in the way rather than improving performance.
exit call will have an immediate impact on the performance of this particular instance.
When you size a cache too small and the aforementioned behavior occurs, we say that
External dependencies can come in various forms and are systems with which your
the cache is thrashing and in this scenario it is almost better to have no cache than a
application interacts. We do not necessarily have control over the code running inside
thrashing cache. Figure 2 attempts to show this graphically.
external dependencies, but we often have control over the configuration of those
external dependencies, so it is important to know when they are running well and when
they are not. Furthermore, we need to be able to differentiate between problems in our
application and problems in dependencies.
Business transactions provide you with the best holistic view of the performance of your
application and can help you triage performance issues, but external dependencies can
significantly affect your applications in unexpected ways unless you are watching them.
13
Chapter 3: Top five performance metrics to capture in enterprise PHP applications
Application topology
The final performance component to measure in this top-5 list is your application
topology. Because of the advent of the cloud, applications can now be elastic in
nature: your application environment can grow and shrink to meet your user demand.
Therefore, it is important to take an inventory of your application topology to determine
whether or not your environment is sized optimally. If you have too many virtual server
instances then your cloud-hosting cost is going to go up, but if you do not have enough
then your business transactions are going to suffer.
Business transactions should be baselined and you should know at any given time the
number of servers needed to satisfy your baseline. If your business transaction load
increases unexpectedly, such as to more than two times the standard deviation of
normal load then you may want to add additional servers to satisfy those users.
The other metric to measure is the performance of your containers. Specifically you
want to determine if any tiers of servers are under duress and, if they are, you may
want to add additional servers to that tier. It is important to look at the servers across
a tier because an individual server may be under duress due to factors like garbage
collection, but if a large percentage of servers in a tier are under duress then it may
indicate that the tier cannot support the load it is receiving.
14
Chapter 4: AppDynamics approach to APM
Chapter 4: AppDynamics approach to APM
Application topology AppDynamics believes that defining static SLAs are difficult to manage over time. Thus,
while AppDynamics allows you to define static SLAs, it primarily relies on baselines for its
This article series has presented an overview of application performance management
analysis. It captures raw business transaction performance data, saves it in its database,
(APM), identified the challenges in implementing an APM strategy, and proposed a
and allows you to choose the best baseline against which to analyze incoming business
top-5 list of important metrics to measure to assess the health of an enterprise PHP
transactions. Baselines can be defining in the following ways:
application. This article pulls all of these topics together to describe an approach
to implementing an APM strategy, and specifically it describes the approach that – Average response time over a period of time
AppDynamics chose when designing its APM solution.
– Hour of day over a period of time
– Hour of day and day of week over a period of time
Business transaction centric
– Hour of day and day of month over a period of time
Because of the dynamic nature of modern applications, AppDynamics chose to design its
solution to be Business Transaction Centric. In other words, business transactions are at Recall from the previous articles in this series that baselines are selected based on the
the forefront of every monitoring rule and decision. Rather than prematurely providing behavior of your application usage. If your application is used consistently over time
unnecessary metrics, it instead first answers the imperative of whether not users are then you can choose to analyze business transactions against a rolling average over a
able to execute their business transactions and if those business transactions are period time. With this baseline, you might analyze the response time against the average
behaving normally. The solution also captures container information within the context response time for every execution of that business transaction over the past 30 days.
of supporting the analysis of business transactions.
If your user behavior varies depending on the hour of day, such as with an internal
Business transactions begin with entry-points, which are the triggers that instantiate intranet application in which users log in at 8:00am and log out at 5:00pm, then you
a business transaction. AppDynamics automatically identifies common entry-points, can opt to analyze the login business transaction based on the hour of day. In this case
such as a web request, a PHP CLI script, an MVC action, and so forth, but also allows we want to analyze a login at 8:15am against the average response time for all logins
you to manually configure them. The goal is to identify and name all common business between 8:00am and 9:00am over the past 30 days.
transactions out-of-the-box, but to provide the flexibility to meet your business needs.
For example, you might define a controller/action that routes from multiple URIs. In this If your user behavior varies depending on the day of the week, such as with an
case, AppDynamics will automatically identify the business transaction, while allowing ecommerce application that experiences more load on Fridays and Saturdays, then
you to define the criteria that splits the business transaction based on your business you can opt to analyze business transaction performance against the hour of day and
needs, such as by fields in the payload. the day of the week. For example, we might want to analyze an add-to-cart business
transaction executed on Friday at 5:15pm against the average add-to-cart response
Once a business transaction has been defined and named, that business transaction time on Fridays from 5:00pm-6:00pm for the past 90 days.
will be followed across all tiers that your application needs to satisfy it. This includes
both synchronous calls (e.g. web service and database calls) as well as asynchronous Finally, if your user behavior varies depending on the day of month, such as a banking
calls (e.g. jobs sent to a queue). AppDynamics adds custom headers, properties, and or ecommerce application that experiences varying load on the 15th and 30th of the
other protocol-specific elements so that it can then assemble all tier segments into a month (when people get paid) then you can opt to analyze business transactions based
single holistic view of the business transaction. It collects the business transaction on its on the hour of day on that day of the month. For example, you might analyze a deposit
management server for analysis. business execution executed at 4:15pm on the 15th of the month against the average
deposit response time from 4:00pm-5:00pm on the 15th of the past six months.
All of this is to say that once you have identified the behavior of your users,
AppDynamics provides you with the flexibility to define how to interpret that data.
Furthermore, because it maintains the raw data for those business transactions, you can
select your baseline strategy dynamically, without needing to wait a couple months for it
to recalibrate itself.
16
Chapter 4: AppDynamics approach to APM
Figure 1 shows an example of how we might evaluate a business transaction against solutions mitigate this by running in a passive mode (a single Boolean check on each
its baseline. instrumented method tells the instrumentation code if it should be capturing response
1 second 2 seconds 3 seconds times). But even this passive monitoring comes at a slightly elevated overhead (one
Boolean check for each method in the business transaction.)
17
Chapter 4: AppDynamics approach to APM
of them? The assertion is that a representative sample illustrating the problem is enough Conclusion
information to allow you to diagnose the root cause of the problem.
Application Performance Management (APM) is a challenge that balances the richness
Finally, if it is a systemic problem, rather than starting this 5 minute sampling for 10 of data and the ability to diagnose the root cause of performance problems with
minutes, AppDynamics is smart enough to give you a 30 minute breather in between the the overhead to capture that data. This article presented several of the facets that
samples. This decision was made to reduce overhead on an already struggling application. AppDynamics used when defining its APM solution:
And then it adds to that a rules engine that can execute actions under the
circumstances that you define. Actions can be general shell scripts that you can write
to do whatever you need to, such as to start 10 new instances and update your load
balancer to include the new instances in load distribution, or they can be specific
prebuilt actions defined in AppDyamics repository to take actions, such as start or stop
Amazon Web Service AMI.
In short, the AppDynamics automation engine provides you with all of the performance
data you need to determine whether or not you need to modify your application
topology as well as the tools to make automatic changes.
18
Chapter 5: PHP APM tips and tricks
Chapter 5: PHP APM tips and tricks
This article series has covered a lot of ground: it presented an overview of application example, AppDynamics automatically defines business transactions for URIs based on
performance management (APM), it identified the challenges in implementing an APM two segments, such as /one/two. For most PHP MVC frameworks, this automatically
strategy, it proposed a top-5 list of important metrics to measure to assess the health routes to the application controller and action. If your application uses one segment or
of an enterprise PHP application, and it presented AppDynamics’ approach to building if it uses four segments, then you need to define your business transactions based on
an APM solution. In this final installment this article provides some tips-and-tricks to your naming convention.
help you implement an optimal APM strategy. Specifically, this article addresses the
following topics: Naming and identifying business transactions is important to ensuring that you’re
capturing the correct business functionality, but it is equally important to exclude as
– Business Transaction Optimization much noise as possible. Do you have any business transactions that you really do not
– Snapshot Tuning care about? For example, is there a web game that checks high scores every couple
minutes? Or is there a PHP CLI cron job that runs every night, takes a long time, but
– Threshold Tuning because it is offline and does not impact the end user, you do not care? If so then
– Tier Management exclude these transactions so that they do not add noise to your analysis.
– Capturing Contextual Information
Snapshot tuning
Business transaction optimization As mentioned in the previous article, AppDynamics intelligently captures performance
Over and over throughout this article series I have been emphasizing the importance of snapshots by both profiling PHP code executions at a specified interval instead of
business transactions to your monitoring solution. To get the most out of your business leveraging code instrumentation for all snapshot elements, and by limiting the number
transaction monitoring, however, you need to do a few times: of snapshots captured in a performance session. Because both of these values can be
tuned, it can benefit you to tune them.
– Properly name your business transactions to match your business functions
Out-of-the-box, AppDynamics captures the entire stack trace while trimming any
– Properly identify your business transactions granularity below the configured threshold. If you are only interested in “big”
– Reduce noise by excluding business transactions that you do not care about performance problems then you may not require granularity as fine as 10 milliseconds.
You can increase this interval to 50 milliseconds, but you will lose granularity. If you
AppDynamics will automatically identify business transactions for you and try to name are finely tuning your application then you may want 10-millisecond granularity, but if
them the best that it can, but depending on how your application is written, these you have no intention of tuning methods that execute in under 50 milliseconds, then
names may or may not be reflective of the business transactions themselves. For why do you need that level of granularity? The point is that you should analyze your
example, you may have a business transaction identified as “POST /payment” that requirements and tune accordingly.
equates to your checkout flow. In this case, it is going to be easier for your operations
staff, as well as when generating reports that you might share with executives, if Next, observe your production troubleshooting patterns and determine whether or not
business transactions names reflect their business function. So consider renaming this the number of snapshots that AppDynamics captures is appropriate for your situation. If
business transaction to “Checkout”. you find that, while capturing up to 5 snapshots every minute for 5 minutes is resulting
in 20 or more snapshots, but you only ever review 2 of those snapshots then do not
Next, if you have multiple business transactions that are identified by a single entry- bother capturing 20. Try configuring AppDynamics to capture up to 1 snapshot every
point, take the time to break those into individual business transactions. There are minute for 5 minutes. And if you’re only interested in systemic problems then you can
several examples where this might happen, which include the following: turn down the maximum number of attempts to 5. This will significantly reduce that
constant overhead, but at the cost of possibly not capturing a representative snapshot.
– URLs that route to the same MVC controller and action
– Business Transactions that determine their function based on their payload
Threshold tuning
– Business Transactions that determine their function based on GET parameters
AppDynamics has designed a generic monitoring solution and, as such, it defaults to
– Complex URI paths alerting to business transactions that are slower than two standard deviations from
normal. This works well in most circumstances, but you need to identify how volatile
If a single entry-point corresponds to multiple business functions then configure the
your application response times are to determine whether or not this is the best
business transactions based on the differentiating criteria. For example, if the body
configuration for your business needs.
of an HTTP POST has an “operation” element that identifies the operation to perform
then break the transaction based on that operation. Or if there is an “execute” action
that accepts a “command” URI argument, then break the transaction based on the
“command” segment. Finally, URI patterns can vary from application to application,
so it is important for you to choose the one that best matches your application. For
20
Chapter 5: PHP APM tips and tricks
AppDynamics defines three types of thresholds against which business transactions are Capturing contextual information
evaluated with their baselines:
When performance problems occur, they are sometimes limited to a specific browser or
– Standard Deviation: compares the response time of a business transaction against a mobile device, or they may only occur based on input associated with a request. If the
number of standard deviations from its baseline problem is not systemic (across all of your servers), then how do you identify the subset
of requests that are causing the problem?
– Percentage: compares the response time of a business transaction against a
percentage of difference from baseline The answer is that you need to capture context-specific information in your snapshots
– Static SLAs: compares the response time of a business transaction against a static so that you can look for commonalities. These might include:
value, such as 2 seconds
– HTTP headers, such as browser type (user-agent), cookies, or referrer
If your application response times are volatile, then the default threshold of two – HTTP GET parameter values
standard deviations might result in too many false alerts. In this case you might want
– Method parameter values
to increase this to more standard deviations or even switch to another strategy. If your
application response times have low volatility then you might want to decrease your – Application variables and their values
thresholds to alert you to problems sooner. Furthermore, if you have services or APIs
Think about all of the pieces of information that you might need to troubleshoot and
that you provide to users that have specific SLAs then you should setup a static SLA
isolate a subset of poor performing PHP transactions. For example, if you capture
value for that business transaction. AppDynamics provides you with the flexibility of
the User-Agent HTTP header then you can know the browser that the user was using
defining alerting rules generally or on individual business transactions.
to execute your business transaction. If your HTTP request accepts GET parameters,
You need to analyze your application behavior and configure the alerting engine accordingly. such as a search string, then you might want to see the value of one or more of those
parameters, e.g. what was the user searching for? Additionally, if you have code-level
understanding about how your application works, you might want to see the values of
Tier management specific method parameters.
I’ve described how AppDynamics captures baselines for business transactions, but
it also captures baselines for business transactions across tiers. For example, if your AppDynamics can be configured to capture contextual information and add it to
business transaction calls a rules engine service tier then AppDynamics will capture snapshots, which can include all of the aforementioned types of values. The process can
the number of calls and the average response time for that tier as a contributor to the be summarized as follow:
business transaction baseline. Therefore, you want to ensure that all of your tiers are
1. AppDynamics observes that a business transaction is running slow
clearly identified.
2. It triggers the capture of a session of snapshots
Out of the box, AppDynamics identifies tiers across common protocols, such as HTTP,
3. On each snapshot, it captures the contextual information that you requested and
PHP CLI, PHP MVC, and so forth. For example, if it sees you make a database call then it
associates it with the snapshot
assumes that there is a database and allocates the time spent in the method call to the
database. This is important because you don’t want to think that you have a very slow The result is that when you find a snapshot illustrating the problem, you can review this
“save” method in a ORM class, instead you want to know how long it takes to persist contextual information to see if it provides you with more diagnostic information.
your object to the database and attribute that time to the database.
The only warning is that this comes at a small price: AppDynamics uses code
AppDynamics does a good job of identifying tiers that follow common protocols, but instrumentation to capture the values of methods parameters. In other words, use this
there are times when you’re communication with a back-end system does not use a functionality where you need to, but use it sparingly.
common protocol. For example, I was working at an insurance company that used an
AS/400 for quoting. We leveraged a library that used a proprietary socket protocol to
make a connection to the server. Obviously AppDynamics would know nothing about
that socket connection and how it was being used, so the answer to our problem was
to identify the method call that makes the connection to the AS/400 and identify it as
a custom back-end resource. When you do this, AppDynamics treats that method call
as a tier and counts the number of calls and captures the average response time of that
method execution.
You might be able to use the out of the box functionality, but if you have special
requirements then AppDynamics provides a mechanism that allows you to manually define
your application tiers by using the PHP API functions to further tailor your application.
21
Chapter 5: PHP APM tips and tricks
Conclusion
Application Performance Management (APM) is a challenge that balances the richness
of data and the ability to diagnose the root cause of PHP performance problems with
minimum overhead required to capture that data. There are configuration options and
tuning capabilities that you can employ to provide you with the information you need
while minimizing the amount of overhead on your application. This article reviewed a
few core tips and tricks that anyone implementing an APM strategy should consider.
Specifically it presented recommendations about the following:
APM is not easy, but tools like AppDynamics make it easy for you to capture the
information you need while reducing the impact to your production applications.
22
appdynamics.com
© 2015 Copyright AppDynamics