05 Monitoring The Cluster
05 Monitoring The Cluster
Log levels changed in this way are reset when the daemon restarts, which is usually
what you want. However, to make a persistent change to a log level, simply change the
log4j.properties file in the configuration directory. In this case, the line to add is:
log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
The HDFS and MapReduce daemons collect information about events and
measurements that are collectively known as metrics.
Metrics belong to a context, and Hadoop currently uses dfs, mapred, rpc, and
jvm contexts. Hadoop daemons usually collect metrics under several contexts.
You can view raw metrics gathered by a particular Hadoop daemon by connecting to
its /metrics web page. This is handy for debugging. For example, you can view
jobtracker metrics in plain text at http://jobtracker-host:50030/metrics . To retrieve
metrics in JSON format you would use
http://jobtracker-host:50030/metrics?format=json
FileContext
FileContext writes metrics to a local file. It exposes two configuration properties:
fileName, which specifies the absolute name of the file to write to, and period, for the
time interval (in seconds) between file updates.
FileContext can be useful on a local system for debugging purposes, but is unsuitable
on a larger cluster since the output files are spread across the cluster, which makes
analyzing them difficult.
GangliaContext
Ganglia (http://ganglia.info/) is an open source distributed monitoring system for
very large clusters. It is designed to impose very low resource overheads on each node
in the cluster. Ganglia itself collects metrics, such as CPU and memory usage; by
using GangliaContext, you can inject Hadoop metrics into Ganglia.
CompositeContext
CompositeContext allows you to output the same set of metrics to multiple contexts,
such as a FileContext and a GangliaContext.
Java Management Extensions (JMX) is a standard Java API for monitoring and
managing applications. Hadoop includes several managed beans (MBeans), which
expose Hadoop metrics to JMX-aware applications. There are MBeans that expose
the metrics in the dfs and rpc contexts
Its common to use Ganglia in conjunction with an alerting system like Nagios for
monitoring a Hadoop cluster. Ganglia is good for efficiently collecting a large number
of metrics and graphing them, whereas Nagios and similar systems are good at
sending alerts when a critical threshold is reached in any of a smaller set of metrics.
Demo
10
11
12