General Notes: Hands-On Exercises: Apache Hadoop For Developers 2012
General Notes: Hands-On Exercises: Apache Hadoop For Developers 2012
Developers 2012
1
General Notes
This training courses use a Virtual Machine running the CentOS 5.6 Linux
distribution. This VM has Clouderas Distribution including Apache Hadoop
version 3 (CDH3) installed in Pseudo-Distributed mode. Pseudo-Distributed
mode is a method of running Hadoop whereby all five Hadoop daemons run on
the same machine. It is, essentially, a cluster consisting of a single machine. It
works just like a larger Hadoop cluster, the only key difference (apart from
speed, of course!) being that the block replication factor is set to 1, since there
is only a single DataNode available.
shown (on two lines), or you can enter it on a single line. If you do the latter, you
Hadoop
Hadoop is already installed, configured, and running on your virtual machine.
Hadoop is installed in the /usr/lib/hadoop directory. You can refer to this using
the environment variable $HADOOP_HOME, which is automatically set in any
terminal you open on your desktop.
Most of your interaction with the system will be through a command-line
wrapper called hadoop. If you start a terminal and run this program with no
arguments, it prints a help message. To try this, run the following command:
$ hadoop
(Note: although your command prompt is more verbose, we use $ to indicate
the command prompt for brevitys sake.)
The hadoop command is subdivided into several subsystems. For example,
there is a subsystem for working with files in HDFS and another for launching
and managing MapReduce processing jobs.
1. Open a terminal window (if one is not already open) by double-clicking the
Terminal icon on the desktop.
2. In the terminal window, enter:
$ hadoop fs
You see a help message describing all the commands associated with this
subsystem.
3. Enter:
$ hadoop fs -ls /
This shows you the contents of the root directory in HDFS. There will be multiple
entries, one of which is /user. Individual users have a home directory under
this directory, named after their username - your home directory is
/user/training.
4. Try viewing the contents of the /user directory by running:
$ hadoop fs -ls /user
You will see your home directory in the directory listing.
5. Try running:
$ hadoop fs -ls /user/training
There are no files, so the command silently exits. This is different than if you ran
hadoop fs -ls /foo, which refers to a directory that doesnt exist and which
would display an error message.