0% found this document useful (0 votes)

13 views

big data

Assignment

Uploaded by

Mansha Singad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

big data

Assignment

Uploaded by

Mansha Singad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Experiment 3 : Installation of Hadoop framework in standalone mode

Step 1 — Installing Java

To get started, you’ll update our package list and install OpenJDK, the default Java Development Kit on
Ubuntu 20.04:

Once the installation is complete, let’s check the version.

This output verifies that OpenJDK has been successfully installed.

Step 2 — Installing Hadoop

With Java in place, you’ll visit the Apache Hadoop Releases page to find the most recent stable release.

Navigate to binary for the release you’d like to install. In this guide you’ll install Hadoop 3.3.1, but you can
substitute the version numbers in this guide with one of your choice.

On the next page, right-click and copy the link to the release binary.
On the server, you’ll use wget to fetch it:

In order to make sure that the file you downloaded hasn’t been altered, you’ll do a quick check using SHA-512,
or the Secure Hash Algorithm 512. Return to the releases page, then right-click and copy the link to the
checksum file for the release binary you downloaded:

Again, you’ll use wget on our server to download the file:

Then run the verification:

Output

2fd0bf74852c797dc864f373ec82ffaa1e98706b309b30d1effa91ac399b477e1accc1ee74d4ccbb1db7da1c5c541b
72e4a834f131a99f2814b030fbd043df66 hadoop-3.3.1.tar.gz

Compare this value with the SHA-512 value in the .sha512 file:

~/hadoop-3.3.1.tar.gz.sha512

...

SHA512 (hadoop-3.3.1.tar.gz) =
2fd0bf74852c797dc864f373ec82ffaa1e98706b309b30d1effa91ac399b477e1accc1ee74d4ccbb1db7da1c5c541b
72e4a834f131a99f2814b030fbd043df66

...

The output of the command you ran against the file you downloaded from the mirror should match the value in
the file you downloaded from apache.org.

Now that you’ve verified that the file wasn’t corrupted or changed, you can extract it:
Use the tar command with the -x flag to extract, -z to uncompress, -v for verbose output, and -f to specify that
you’re extracting from a file.

Finally, you’ll move the extracted files into /usr/local, the appropriate place for locally installed software:

With the software in place, you’re ready to configure its environment.

Step 3 — Configuring Hadoop’s Java Home

Hadoop requires that you set the path to Java, either as an environment variable or in the Hadoop configuration
file.

The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java, which is in turn a symlink to default Java
binary. You will use readlink with the -f flag to follow every symlink in every part of the path, recursively.
Then, you’ll use sed to trim bin/java from the output to give us the correct value for JAVA_HOME.

You can copy this output to set Hadoop’s Java home to this specific version, which ensures that if the default
Java changes, this value will not. Alternatively, you can use the readlink command dynamically in the file so
that Hadoop will automatically use whatever Java version is set as the system default.

To begin, open hadoop-env.sh:

Then, modify the file by choosing one of the following options:

Option 1: Set a Static Value

Option 2: Use Readlink to Set the Value Dynamically

If you have trouble finding these lines, use CTRL+W to quickly search through the text. Once you’re done, exit
with CTRL+X and save your file.

Note: With respect to Hadoop, the value of JAVA_HOME in hadoop-env.sh overrides any values that are set in
the environment by /etc/profile or in a user’s profile.

Step 4 — Running Hadoop

Now you should be able to run Hadoop:

This output means you’ve successfully configured Hadoop to run in stand-alone mode.
Experiment 4 : Installation of Hadoop framework in pseudo distribution mode
Step 1: Download Binary Package :
Download the latest binary from the following site as follows.

http://hadoop.apache.org/releases.html

For reference, you can check the file save to the folder as follows.

C:\BigData

Step 2: Unzip the binary package

Open Git Bash, and change directory (cd) to the folder where you save the binary package and then unzip as
follows.

$ cd C:\BigData

MINGW64: C:\BigData

$ tar -xvzf hadoop-3.1.2.tar.gz

For my situation, the Hadoop twofold is extricated to C:\BigData\hadoop-3.1.2.

Next, go to this GitHub Repo and download the receptacle organizer as a speed as demonstrated as follows.
Concentrate the compress and duplicate all the documents present under the receptacle envelope to
C:\BigData\hadoop-3.1.2\bin. Supplant the current records too.

Step 3: Create folders for datanode and namenode :

 Goto C:/BigData/hadoop-3.1.2 and make an organizer ‘information’. Inside the ‘information’ envelope
make two organizers ‘datanode’ and ‘namenode’. Your documents on HDFS will dwell under the
datanode envelope.

 Set Hadoop Environment Variables

 Hadoop requires the following environment variables to be set.

HADOOP_HOME=” C:\BigData\hadoop-3.1.2”

HADOOP_BIN=”C:\BigData\hadoop-3.1.2\bin”

JAVA_HOME=<Root of your JDK installation>”

 To set these variables, navigate to My Computer or This PC.

Right-click -> Properties -> Advanced System settings -> Environment variables.
 Click New to create a new environment variable.

 In the event that you don’t have JAVA 1.8 introduced, at that point you’ll have to download and introduce it
first. In the event that the JAVA_HOME climate variable is now set, at that point check whether the way has
any spaces in it (ex: C:\Program Files\Java\… ). Spaces in the JAVA_HOME way will lead you to issues.
There is a stunt to get around it. Supplant ‘Program Files ‘to ‘Progra~1’in the variable worth. Guarantee
that the variant of Java is 1.8 and JAVA_HOME is highlighting JDK 1.8.

Step 4: To make Short Name of Java Home path

 Set Hadoop Environment Variables

 Edit PATH Environment Variable

 Click on New and
Add %JAVA_HOME%, %HADOOP_HOME%, %HADOOP_BIN%, %HADOOP_HOME%/sin to
your PATH one by one.
 Now we have set the environment variables, we need to validate them. Open a new Windows Command
prompt and run an echo command on each variable to confirm they are assigned the desired values.

echo %HADOOP_HOME%

echo %HADOOP_BIN%

echo %PATH%

 On the off chance that the factors are not instated yet, at that point it can likely be on the grounds that
you are trying them in an old meeting. Ensure you have opened another order brief to test them.

Step 5: Configure Hadoop

Once environment variables are set up, we need to configure Hadoop by editing the following configuration
files.

hadoop-env.cmd

core-site.xml
hdfs-site.xml

mapred-site.xml

yarn-site.xml

hadoop-env.cmd

First, let’s configure the Hadoop environment file. Open C:\BigData\hadoop-3.1.2\etc\hadoop\hadoop-env.cmd

and add below content at the bottom

set HADOOP_PREFIX=%HADOOP_HOME%

set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
set YARN_CONF_DIR=%HADOOP_CONF_DIR%

set PATH=%PATH%;%HADOOP_PREFIX%\bin

Step 6: Edit hdfs-site.xml

After editing core-site.xml, you need to set the replication factor and the location of namenode and datanodes.
Open C:\BigData\hadoop-3.1.2\etc\hadoop\hdfs-site.xml and below content within <configuration>
</configuration> tags.

<name>dfs.replication</name>

</property>

<name>dfs.namenode.name.dir</name>
<value>C:\BigData\hadoop-3.2.1\data\namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>C:\BigData\hadoop-3.1.2\data\datanode</value>

</property>

</configuration>

Step 7: Edit core-site.xml

Now, configure Hadoop Core’s settings. Open C:\BigData\hadoop-3.1.2\etc\hadoop\core-site.xml and below

content within <configuration> </configuration> tags.

<name>fs.default.name</name>
<value>hdfs://0.0.0.0:19000</value>

</property>

</configuration>

Step 8: YARN configurations

Edit file yarn-site.xml

Make sure the following entries are existing as follows.

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value> </property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>
</configuration>

Step 9: Edit mapred-site.xml

At last, how about we arrange properties for the Map-Reduce system. Open C:\BigData\hadoop-
3.1.2\etc\hadoop\mapred-site.xml and beneath content inside <configuration> </configuration> labels. In the
event that you don’t see mapred-site.xml, at that point open mapred-site.xml.template record and rename it to
mapred-site.xml

<property>
<name>mapreduce.job.user.name</name> <value>%USERNAME%</value>

</property>

<name>mapreduce.framework.name</name>

</property>

<name>yarn.apps.stagingDir</name> <value>/user/%USERNAME%/staging</value>

</property>

<name>mapreduce.jobtracker.address</name>

<value>local</value>

</property>
</configuration>

Check if C:\BigData\hadoop-3.1.2\etc\hadoop\slaves file is present, if it’s not then created one and add
localhost in it and save it.

Step 10: Format Name Node :

To organize the Name Node, open another Windows Command Prompt and run the beneath order. It might give
you a few admonitions, disregard them.

 hadoop namenode -format

Format Hadoop Name Node

Step 11: Launch Hadoop :

Open another Windows Command brief, make a point to run it as an Administrator to maintain a strategic
distance from authorization mistakes. When opened, execute the beginning all.cmd order. Since we have
added %HADOOP_HOME%\sbin to the PATH variable, you can run this order from any envelope. In the event
that you haven’t done as such, at that point go to the %HADOOP_HOME%\sbin organizer and run the order.

You can check the given below screenshot for your reference 4 new windows will open and cmd terminals for 4
daemon processes like as follows.
 namenode

 datanode

 node manager

 resource manager
Don’t close these windows, minimize them. Closing the windows will terminate the daemons. You can run
them in the background if you don’t like to see these windows.

Step 12: Hadoop Web UI

In conclusion, how about we screen to perceive how are Hadoop daemons are getting along. Also you can
utilize the Web UI for a wide range of authoritative and observing purposes. Open your program and begin.

Step 13: Resource Manager

Open localhost:8088 to open Resource Manager

Step 14: Node Manager

Open localhost:8042 to open Node Manager

Step 15: Name Node :

Open localhost:9870 to check out the health of Name Node

Step 16: Data Node :
Open localhost:9864 to check out Data Node
Experiment 5 : Installation of Hadoop framework in fully distributed mode

1. Download Hadoop from the official website and put it in an appropriate directory (eg. /usr/local/) later
on we shall refer to this folder as HADOOP_PARENT_DIR. The full path of the hadoop home
directory shall instead be referred to as HADOOP_HOME.

2. Set the hadoop user as the owner of the hadoop home directory.

3. Configure Hadoop, it’s just about editing a couple of configuration files, you will find them in
the $HADOOP_PARENT_DIR/etc/hadoop/ directory. The files to be edited are core-site.xml, hdfs-
site.xml, mapred-site.xml, yarn-site.xml workers, hadoop-env.sh:
 core-site.xml

<? xml version = "1.0" encoding = "UTF-8"?>

<? xml-stylesheet type = "text/xsl" href = "configuration.xsl"?>


<Configuration>
<Property>
<Name> fs.defaultFS </name>
<Value> hdfs:/master:9000 </value>
</Property>
</Configuration>

We can see a property which specifies that the hdfs namenode process is hosted by the node which hostname
is master, and is running on port 9000.

 hdfs-site.xml

<? xml version = "1.0" encoding = "UTF-8"?>

<? xml-stylesheet type = "text/xsl" href = "configuration.xsl"?>
<! -
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file in compliance with the License.
You can get a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law
distributed under the license is distributed on an ASIS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions
limitations under the License. See accompanying LICENSE file.
->

<Configuration>
<Property>
<Name> dfs.replication </name>
<Value> 3 </value>
</Property>

<Property>
<Name> dfs.namenode.name.dir </name>
<Value> /home/hadoop/data/hdfs/NameNode </value>
</Property>

<Property>
<Name> dfs.datanode.data.dir </name>
<Value> /home/hadoop/data/hdfs/datanode </value>
</Property>
</Configuration>

In this file, we specify the replication factor for hdfs, here it is set to 3 , we also specify where to store the hdfs
data on each node. Here in the /home/hadoop/data/hdfs/namenode folder for the namenode, and in
the /home/hadoop/data/hdfs/datanode folder for the datanodes.

 mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>2048</value>
</property> <property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property> <property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
</property>
</configuration>

In this file, the amount of RAM to allocate to the mapreduce applications in the node is specified. We maintain
the same configurations for each node, the exact values you should use depend on the specs of the computer
and its workload besides Hadoop. The above configuration is one adopted on an existing cluster with 4GB ram
datanodes

 yarn-site.xml

<?xml version="1.0"?>

<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property> <property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property> <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property> <property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property> <property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property> <property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property> <property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>

For more information on how to choose the memory configuration values when editing the mapred-
site.xml and yarn-site.xml files, I suggest to read this post.

 workers
Probably the simplest configuration file, we just have to list the hostnames of the nodes hosting the
datanodes in our cluster (one hostname per line).

 hadoop-env.sh
We only have to set the JAVA_HOME environment variable at the end of the file.

8. La last step is to update the hadoop user’s bash profile, that is to edit the /home/hadoop/.bashrc file. We
have to:

 Set and export the JAVA_HOME environment variable

 Set the PDSH_RCMD_TYPE environment variable to ssh

 Set the HADOOP_HOME environment variable

 Add the $HADOOP_HOME/bin and $HADOOP_HOME/sbin to the PATH environment variable,

then export it.

9. Finished!
To check if all went right you can launch yarn and hdfs by typing the following command lines on your master
node.

hdfs namenode -format

start-dfs.sh
start-yarn.sh
Experiment 7 : Installation of pig

Step 1

Open the homepage of Apache Pig website. Under the section News, click on the link release page as shown in
the following snapshot.

Step 2

On clicking the specified link, you will be redirected to the Apache Pig Releases page. On this page, under
the Download section, you will have two links, namely, Pig 0.8 and later and Pig 0.7 and before. Click on the
link Pig 0.8 and later, then you will be redirected to the page having a set of mirrors.
Step 3

Choose and click any one of these mirrors as shown below.

Step 4

These mirrors will take you to the Pig Releases page. This page contains various versions of Apache Pig. Click
the latest version among them.

Step 5

Within these folders, you will have the source and binary files of Apache Pig in various distributions. Download
the tar files of the source and binary files of Apache Pig 0.15, pig0.15.0-src.tar.gz and pig-0.15.0.tar.gz.

Install Apache Pig

After downloading the Apache Pig software, install it in your Linux environment by following the steps given
below.

Step 1

Create a directory with the name Pig in the same directory where the installation directories of Hadoop,
Java, and other software were installed. (In our tutorial, we have created the Pig directory in the user named
Hadoop).

$ mkdir Pig

Step 2

Extract the downloaded tar files as shown below.

$ cd Downloads/
$ tar zxvf pig-0.15.0-src.tar.gz

$ tar zxvf pig-0.15.0.tar.gz

Step 3

Move the content of pig-0.15.0-src.tar.gz file to the Pig directory created earlier as shown below.

$ mv pig-0.15.0-src.tar.gz/* /home/Hadoop/Pig/

Configure Apache Pig

After installing Apache Pig, we have to configure it. To configure, we need to edit two files − bashrc and
pig.properties.

.bashrc file
In the .bashrc file, set the following variables −

 PIG_HOME folder to the Apache Pig’s installation folder,

 PATH environment variable to the bin folder, and

 PIG_CLASSPATH environment variable to the etc (configuration) folder of your Hadoop installations
(the directory that contains the core-site.xml, hdfs-site.xml and mapred-site.xml files).

export PIG_HOME = /home/Hadoop/Pig

export PATH = $PATH:/home/Hadoop/pig/bin

export PIG_CLASSPATH = $HADOOP_HOME/conf

pig.properties file
In the conf folder of Pig, we have a file named pig.properties. In the pig.properties file, you can set various
parameters as given below.

pig -h properties

The following properties are supported −

Logging: verbose = true|false; default is false. This property is the same as -v

switch brief=true|false; default is false. This property is the same

as -b switch debug=OFF|ERROR|WARN|INFO|DEBUG; default is INFO.

This property is the same as -d switch aggregate.warning = true|false; default is true.

If true, prints count of warnings of each type rather than logging each warning.

Performance tuning: pig.cachedbag.memusage=<mem fraction>; default is 0.2 (20% of all memory).

Note that this memory is shared across all large bags used by the application.

pig.skewedjoin.reduce.memusagea=<mem fraction>; default is 0.3 (30% of all memory).

Specifies the fraction of heap available for the reducer to perform the join.
pig.exec.nocombiner = true|false; default is false.

Only disable combiner as a temporary workaround for problems.

opt.multiquery = true|false; multiquery is on by default.

Only disable multiquery as a temporary workaround for problems.

opt.fetch=true|false; fetch is on by default.

Scripts containing Filter, Foreach, Limit, Stream, and Union can be dumped without MR jobs.

pig.tmpfilecompression = true|false; compression is off by default.

Determines whether output of intermediate jobs is compressed.

pig.tmpfilecompression.codec = lzo|gzip; default is gzip.

Used in conjunction with pig.tmpfilecompression. Defines compression type.

pig.noSplitCombination = true|false. Split combination is on by default.

Determines if multiple small files are combined into a single map.

pig.exec.mapPartAgg = true|false. Default is false.

Determines if partial aggregation is done within map phase, before records are sent to combiner.
pig.exec.mapPartAgg.minReduction=<min aggregation factor>. Default is 10.
If the in-map partial aggregation does not reduce the output num records by this factor, it gets disabled.

Miscellaneous: exectype = mapreduce|tez|local; default is mapreduce. This property is the same as -x switch

pig.additional.jars.uris=<comma seperated list of jars>. Used in place of register command.

udf.import.list=<comma seperated list of imports>. Used to avoid package names in UDF.

stop.on.failure = true|false; default is false. Set to true to terminate on the first error.
pig.datetime.default.tz=<UTC time offset>. e.g. +08:00. Default is the default timezone of the host.

Determines the timezone used to handle datetime datatype and UDFs.

Additionally, any Hadoop property can be specified.

Verifying the Installation

Verify the installation of Apache Pig by typing the version command. If the installation is successful, you will
get the version of Apache Pig as shown below.

$ pig –version

Apache Pig version 0.15.0 (r1682971)

compiled Jun 01 2015, 11:44:35

Experiment 8 : Installation of Hive

Step 1: Verifying JAVA Installation

Java must be installed on your system before installing Hive. Let us verify java installation using the following
command:

$ java –version

If Java is already installed on your system, you get to see the following response:

java version "1.7.0_71"

Java(TM) SE Runtime Environment (build 1.7.0_71-b13)

Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

If java is not installed in your system, then follow the steps given below for installing java.
Installing Java

Step I:

Download java (JDK <latest version> - X64.tar.gz) by visiting the following

link http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html.

Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system.

Step II:

Generally you will find the downloaded java file in the Downloads folder. Verify it and extract the jdk-7u71-
linux-x64.gz file using the following commands.

$ cd Downloads/

$ ls

jdk-7u71-linux-x64.gz

$ tar zxf jdk-7u71-linux-x64.gz

$ ls
jdk1.7.0_71 jdk-7u71-linux-x64.gz

Step III:

To make java available to all the users, you have to move it to the location “/usr/local/”. Open root, and type the
following commands.

$ su

password:

# mv jdk1.7.0_71 /usr/local/

# exit

Step IV:
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.
export JAVA_HOME=/usr/local/jdk1.7.0_71

export PATH=$PATH:$JAVA_HOME/bin

Now apply all the changes into the current running system.

$ source ~/.bashrc

Step V:

Use the following commands to configure java alternatives:

# alternatives --install /usr/bin/java/java/usr/local/java/bin/java 2

# alternatives --install /usr/bin/javac/javac/usr/local/java/bin/javac 2

# alternatives --install /usr/bin/jar/jar/usr/local/java/bin/jar 2

# alternatives --set java/usr/local/java/bin/java

# alternatives --set javac/usr/local/java/bin/javac

# alternatives --set jar/usr/local/java/bin/jar

Now verify the installation using the command java -version from the terminal as explained above.

Step 2: Downloading Hive

We use hive-0.14.0 in this tutorial. You can download it by visiting the following
link http://apache.petsads.us/hive/hive-0.14.0/. Let us assume it gets downloaded onto the /Downloads
directory. Here, we download Hive archive named “apache-hive-0.14.0-bin.tar.gz” for this tutorial. The
following command is used to verify the download:

$ cd Downloads

$ ls

On successful download, you get to see the following response:

apache-hive-0.14.0-bin.tar.gz

Step 3: Installing Hive

The following steps are required for installing Hive on your system. Let us assume the Hive archive is
downloaded onto the /Downloads directory.

Extracting and verifying Hive Archive

The following command is used to verify the download and extract the hive archive:
$ tar zxvf apache-hive-0.14.0-bin.tar.gz

$ ls

On successful download, you get to see the following response:

apache-hive-0.14.0-bin apache-hive-0.14.0-bin.tar.gz

Copying files to /usr/local/hive directory

We need to copy the files from the super user “su -”. The following commands are used to copy the files from
the extracted directory to the /usr/local/hive” directory.

$ su -
passwd:

# cd /home/user/Download

# mv apache-hive-0.14.0-bin /usr/local/hive

# exit

Setting up environment for Hive

You can set up the Hive environment by appending the following lines to ~/.bashrc file:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin

export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.

export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

The following command is used to execute ~/.bashrc file.

$ source ~/.bashrc

Step 4: Configuring Hive

To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in
the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy the
template file:

$ cd $HIVE_HOME/conf

$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh file by appending the following line:

export HADOOP_HOME=/usr/local/hadoop

Hive installation is completed successfully. Now you require an external database server to configure
Metastore. We use Apache Derby database.

Step 5: Downloading and Installing Apache Derby

Follow the steps given below to download and install Apache Derby:

Downloading Apache Derby

The following command is used to download Apache Derby. It takes some time to download.

$ cd ~

$ wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz

The following command is used to verify the download:

$ ls

On successful download, you get to see the following response:

db-derby-10.4.2.0-bin.tar.gz

Extracting and verifying Derby archive

The following commands are used for extracting and verifying the Derby archive:

$ tar zxvf db-derby-10.4.2.0-bin.tar.gz

$ ls

On successful download, you get to see the following response:

db-derby-10.4.2.0-bin db-derby-10.4.2.0-bin.tar.gz
Copying files to /usr/local/derby directory

We need to copy from the super user “su -”. The following commands are used to copy the files from the
extracted directory to the /usr/local/derby directory:
$ su -

passwd:

# cd /home/user

# mv db-derby-10.4.2.0-bin /usr/local/derby

# exit

Setting up environment for Derby

You can set up the Derby environment by appending the following lines to ~/.bashrc file:
export DERBY_HOME=/usr/local/derby

export PATH=$PATH:$DERBY_HOME/bin

Apache Hive

export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar

The following command is used to execute ~/.bashrc file:

$ source ~/.bashrc

Create a directory to store Metastore

Create a directory named data in $DERBY_HOME directory to store Metastore data.

$ mkdir $DERBY_HOME/data
Derby installation and environmental setup is now complete.

Step 6: Configuring Metastore of Hive

Configuring Metastore means specifying to Hive where the database is stored. You can do this by editing the
hive-site.xml file, which is in the $HIVE_HOME/conf directory. First of all, copy the template file using the
following command:

$ cd $HIVE_HOME/conf

$ cp hive-default.xml.template hive-site.xml

Edit hive-site.xml and append the following lines between the <configuration> and </configuration> tags:

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>

</property>

Create a file named jpox.properties and add the following lines into it:

javax.jdo.PersistenceManagerFactoryClass =

org.jpox.PersistenceManagerFactoryImpl

org.jpox.autoCreateSchema = false
org.jpox.validateTables = false

org.jpox.validateColumns = false

org.jpox.validateConstraints = false

org.jpox.storeManagerType = rdbms

org.jpox.autoCreateSchema = true

org.jpox.autoStartMechanismMode = checked

org.jpox.transactionIsolation = read_committed

javax.jdo.option.DetachAllOnCommit = true

javax.jdo.option.NontransactionalRead = true

javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver

javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true

javax.jdo.option.ConnectionUserName = APP

javax.jdo.option.ConnectionPassword = mine

Step 8: Verifying Hive Installation

Before running Hive, you need to create the /tmp folder and a separate Hive folder in HDFS. Here, we use
the /user/hive/warehouse folder. You need to set write permission for these newly created folders as shown
below:

chmod g+w
Now set them in HDFS before verifying Hive. Use the following commands:

$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

The following commands are used to verify Hive installation:

$ cd $HIVE_HOME
$ bin/hive

On successful installation of Hive, you get to see the following response:

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-

log4j.properties

Hive history file=/tmp/hadoop/hive_job_log_hadoop_201312121621_1494929084.txt

………………….

hive>

The following sample command is executed to display all the tables:

hive> show tables;

OK
Time taken: 2.798 seconds

hive>

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Manual Beremiz
No ratings yet
Manual Beremiz
24 pages
Step 1: Download Binary Package
No ratings yet
Step 1: Download Binary Package
50 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
bda lab record
No ratings yet
bda lab record
60 pages
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
No ratings yet
Practical N0.2 AIM: Install Hadoop Hadoop Installation On Windows 10
12 pages
Steps of Hadoop installation
No ratings yet
Steps of Hadoop installation
3 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
16 pages
Hadoop on Windows
No ratings yet
Hadoop on Windows
13 pages
HDFS Installation Steps
No ratings yet
HDFS Installation Steps
17 pages
Experiment No - 1
No ratings yet
Experiment No - 1
13 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Experiment 1 Hadoop Installation
No ratings yet
Experiment 1 Hadoop Installation
6 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
HADOOP RECORD 2024-FINAL
No ratings yet
HADOOP RECORD 2024-FINAL
59 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
2 - Installation
No ratings yet
2 - Installation
15 pages
BD Lab File
No ratings yet
BD Lab File
39 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
big-data-file
No ratings yet
big-data-file
32 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Bda Record
No ratings yet
Bda Record
27 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
unit-4-unit-4-bda
No ratings yet
unit-4-unit-4-bda
16 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
Step 1 - Install Oracle Java 8 On Ubuntu
No ratings yet
Step 1 - Install Oracle Java 8 On Ubuntu
7 pages
04. Hadoop Installaion (1)
No ratings yet
04. Hadoop Installaion (1)
113 pages
Hadoop 1
No ratings yet
Hadoop 1
39 pages
big_data_1
No ratings yet
big_data_1
2 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
EX1-Installation of Hadoop
No ratings yet
EX1-Installation of Hadoop
6 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Anurag 1-6 Merged
No ratings yet
Anurag 1-6 Merged
60 pages
Aryan
No ratings yet
Aryan
60 pages
Hadoop/Hbase Installation: Install Java
No ratings yet
Hadoop/Hbase Installation: Install Java
11 pages
Hadoop 2.7.3 Setup On Ubuntu 15.10
No ratings yet
Hadoop 2.7.3 Setup On Ubuntu 15.10
7 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
BDAO
No ratings yet
BDAO
23 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
Hadoop installation process
No ratings yet
Hadoop installation process
16 pages
Original
No ratings yet
Original
17 pages
Hadoop Installation (1)
No ratings yet
Hadoop Installation (1)
6 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Digital Signal Processing by J.s.chitode
75% (4)
Digital Signal Processing by J.s.chitode
588 pages
Business and Info Systems-Midterm Exam Question Answers!
No ratings yet
Business and Info Systems-Midterm Exam Question Answers!
5 pages
Designing and Copywriting For Ict Projects
100% (5)
Designing and Copywriting For Ict Projects
52 pages
Cross Compilers
No ratings yet
Cross Compilers
14 pages
Controladora para Pixels: The Lededit Software Instructions
No ratings yet
Controladora para Pixels: The Lededit Software Instructions
13 pages
Introduction To Windchill Pdmlink 10.2 For Light Users
No ratings yet
Introduction To Windchill Pdmlink 10.2 For Light Users
7 pages
Internship Progress Report: Bachelor of Engineering
No ratings yet
Internship Progress Report: Bachelor of Engineering
31 pages
3171715
No ratings yet
3171715
3 pages
Stat Soft Registration
No ratings yet
Stat Soft Registration
3 pages
AI Feature in MS-Excel
No ratings yet
AI Feature in MS-Excel
4 pages
Undergraduate Text in Mathematics
100% (1)
Undergraduate Text in Mathematics
11 pages
Exploring The PICK Operating Sys
No ratings yet
Exploring The PICK Operating Sys
356 pages
Turbine Blade Fir Tree FEA
100% (1)
Turbine Blade Fir Tree FEA
15 pages
IT6502-Digital Signal Processing
No ratings yet
IT6502-Digital Signal Processing
10 pages
Cppcheck Manual
No ratings yet
Cppcheck Manual
35 pages
Register Transfer Level Design With Verilog: Verilog Digital System Design Z. Navabi, Mcgraw-Hill, 2005
No ratings yet
Register Transfer Level Design With Verilog: Verilog Digital System Design Z. Navabi, Mcgraw-Hill, 2005
77 pages
2.2.3 Nor Design Logic Activity
No ratings yet
2.2.3 Nor Design Logic Activity
5 pages
Pointer_Questions_Answers
No ratings yet
Pointer_Questions_Answers
3 pages
Barcode Technology Description
No ratings yet
Barcode Technology Description
16 pages
How To Configure Wired TCP - IP Properties of My Computer (Windows XP, Vista, 7,8, Mac) - Welcome To TP-LINK PDF
No ratings yet
How To Configure Wired TCP - IP Properties of My Computer (Windows XP, Vista, 7,8, Mac) - Welcome To TP-LINK PDF
8 pages
Integration Management Enhance The Performance of Construction Projects
100% (2)
Integration Management Enhance The Performance of Construction Projects
6 pages
10 Steps To A Winning Mba Application 1 Prepare To Apply PDF
No ratings yet
10 Steps To A Winning Mba Application 1 Prepare To Apply PDF
40 pages
Jawapan Lengkap Matematik Tambahan Tingkatan 4 Bahagian A PDF
50% (4)
Jawapan Lengkap Matematik Tambahan Tingkatan 4 Bahagian A PDF
35 pages
Setting Up The Texas Instruments 6711 DSK Target: Labview DSP Module
No ratings yet
Setting Up The Texas Instruments 6711 DSK Target: Labview DSP Module
2 pages
Account Statement 230919 221019
No ratings yet
Account Statement 230919 221019
7 pages
Logs
No ratings yet
Logs
5 pages
Mr.Cooper 3rd round questions
No ratings yet
Mr.Cooper 3rd round questions
2 pages
CI-1 Introduction to CI
No ratings yet
CI-1 Introduction to CI
22 pages
Design Patterns: by Dr. M. Usman Ashraf
No ratings yet
Design Patterns: by Dr. M. Usman Ashraf
57 pages