0% found this document useful (0 votes)

38 views

YarnHdfs Administration

Uploaded by

Nadeem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

YarnHdfs Administration

Uploaded by

Nadeem Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

YARN & HDFS Admin

Data Transfer between S3 and HDFS

wget https://s3.amazonaws.com/cloud-age/dataset

nano /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>AKIAJOWRK4NLW77YYOEA</value>
</property>

<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>2WF8+mHf8wYQaB9NghCpRdh2dvADh5dfVasNxua1</value>
</property>

Hadoop dfsadmin -refreshNodes

cd /usr/local/hadoop/share/hadoop/tools/lib/

export
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*

echo $HADOOP_CLASSPATH

hdfs dfs -cp s3n://cloud-age/dataset /user/ubuntu/dataset

hdfs dfs -cp dataset s3n://cloud-age/Pipeline/Asia/DATASET.TXT

=================================Distcp==================================
=============

hdfs distcp hdfs://ip-10-0-2-95.ec2.internal:9000/user/ubuntu/dataset

hdfs://ip-10-0-3-53.ec2.internal:9000/user/ubuntu/DATASET.txt

hdfs distcp -Dfs.s3n.access.key=AKIAISOKC3T575TZYS4Q -

Dfs.s3n.secret.key=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 hdfs://ip-10-
0-0-7.ec2.internal:9000/user/ubuntu/dataset hdfs://ip-10-0-0-
7.ec2.internal:9000/user/ubuntu/StreamSetDataColloctor

hdfs distcp -Dfs.s3a.access.key=AKIAISOKC3T575TZYS4Q -

Dfs.s3a.secret.key=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 s3a://cloud-
age/dataset hdfs://nn:9000/user/ubuntu/BigData.csv

-------------------------------------------------------------------------
-----------------
• Trash configuration

nano /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
<name>fs.trash.interval</name>
YARN & HDFS Admin

<value>2</value>
<description>Number of minutes between trash checkpoints. If zero, the
trash feature is
disabled</description>
</property>

<property>
<name>fs.trash.checkpoint.interval</name>
<value>1</value>
</property>

hdfs dfs -ls -R hdfs://nn:9000/user/ubuntu/.Trash/Current

hdfs dfs –cp

hdfs://nn:9000/user/ubuntu/.Trash/Current/user/ubuntu/datasets
/user/ubuntu/DATASET.txt

hdfs dfs -rm -r -skipTrash /user/ubuntu/StreamSetDataColloctor

hdfs dfs -expunge

hdfs dfs -ls -R hdfs://nn:9000/user/ubuntu/.Trash/Current

---------- remove file from trash

This command causes the NameNode to permanently delete files from the
trash that are older than the threshold, instead of waiting for the next
emptier window. It immediately removes expired checkpoints from the file
system.

For a production environment, it is recommended that you enable trash to

avoid unexpected removal operations. Enabling trash provides a chance to
recover data from operational or user errors. But it is also important to
set appropriate values for fs.trash.interval and
fs.trash.checkpoint.interval to make trash work the way you expect it to
work. For example, if you need to frequently upload and delete files from
the HDFS, you probably want to set fs.trash.interval to a smaller value,
otherwise the checkpoints would take up too much space.

Keep in mind that when trash is enabled and you remove some files, HDFS
capacity does not increase because files are not truly deleted. The HDFS
does not reclaim the space unless the files are removed from the trash,
which occurs only after checkpoints are expired. Sometimes you might want
to temporarily disable trash when deleting files; in this case, you can
run the rm command with the -skipTrash option.
YARN & HDFS Admin

_ Block Size___

_________________________________
nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.block.size</name>
<value>524288</value>
</property>

hadoop dfs -put dataset .

<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>

hadoop dfs -rm -r /user/ubuntu/*

hadoop dfs -D dfs.blocksize=268435456 -cp /user/ubuntu/dataset

/user/ubuntu/DaTaSeT.TSV
hadoop dfsck -storagepolicies dataset
------------------------------------------------------------------

Setting Replication for a dataset

hdfs dfs -setrep 4 dataset Replicate

hdfs dfs -setrep -w 4 dataset Write Now
hdfs dfs -setrep -R 4 /user/* Recursive

hdfs dfsadmin -metasave metasave-report.txt

------------------------------------------------------
hadoop Archivals

hdfs archive -archiveName positive.har -p /user/ubuntu/datasets

/user/ubuntu/
hdfs dfs -ls
hdfs dfs -lsr positive.har
hdfs dfs -cat positive.har/part-0
hdfs dfs -ls -R har:///user/ubuntu/positive.har/
hdfs dfs -cat har:///user/ubuntu/positive.har

hdfs version

hdfs dfs -ls / ---------to check

hdfs root
hdfs dfs -du -h hdfs:/ -------------disk
usage
hdfs dfs -count -h -q hdfs:/ --------- file
count
YARN & HDFS Admin

hdfs dfs -df -h hdfs:/ --

hdfs dfsadmin -report -live
hdfs dfsadmin -report
hdfs dfsadmin -printTopology

------------------------------------------------------

hdfs storagepolicies -listPolicies

hdfs storagepolicies -getStoragePolicy -path /user/ubuntu/datasets
hdfs storagepolicies -setStoragePolicy -path /user/ubuntu/datasets -
policy cold

dfs.storage.policy.enabled

https://hortonworks.com/blog/heterogeneous-storage-policies-hdp-2-2/
---------------------------------------------------------------------
dfs.datanode.failed.volumes.tolerated

----------------------------------------------------
hdfs balancer -threshold 10

This specifies that each DataNode's disk usage must be within 5% of

the cluster's overall usage.
Free up the spaces from some nearly full datanodes.
Move data to some newly added datanodes in order to utilize the new
machines.
Run Balancer when the cluster load is low or in a maintenance window,
instead of running it as a background daemon.

hdfs mover

A new data migration tool is added for archiving data. The tool is
similar to Balancer. It periodically scans the files in HDFS to check if
the block placement satisfies the storage policy. For the blocks
violating the storage policy, it moves the replicas to a different
storage type in order to fulfill the storage policy requirement.
--------------------------------------------

cd /usr/local/hadoop/data/hadoop/namenode/current
cat seen_txid
hdfs dfsadmin -rollEdits
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
Save current namespace into storage directories and reset edits log
hdfs dfsadmin -fetchImage /home/ubuntu/hdfsbackup.2017
hdfs dfsadmin -restoreFailedStorage check
hdfs dfsadmin -restoreFailedStorage true
------------------------------------------------
YARN & HDFS Admin

hadoop secondarynamenode -checkpoint force

Adding users in ubuntu

sudo adduser dfs
sudo adduser dfs sudo
hdfs dfs -mkdir /user/dfs
hdfs dfs -chown dfs:supergroup /user/dfs
su dfs
cp /home/ubuntu/.bashrc ~/
bash

Setting permissions

hdfs dfs -chmod 600 /user/dfs

hdfs dfs -ls -R

set Quota

hdfs dfs -mkdir /user/ubuntu/testQuota

hdfs dfsadmin -setQuota 3 /user/ubuntu/testQuota
hdfs dfs -mkdir /user/ubuntu/testQuota/1
hdfs dfs -mkdir /user/ubuntu/testQuota/2
hdfs dfs -mkdir /user/ubuntu/testQuota/3
hdfs dfs -count -q /user/ubuntu/testQuota
hdfs dfsadmin -clrQuota /user/ubuntu/testQuota
hdfs dfsadmin -setSpaceQuota 500M /user/ubuntu/testQuota
hdfs dfs -cp hadoop-2.7.2.tar.gz /user/ubuntu/testQuota
hdfs dfs -count -h -q /user/ubuntu/testQuota
hdfs dfsadmin -clrSpaceQuota /user/ubuntu/testQuota

======================================================================

CREATE SNAPSHOTS

hdfs dfs -mkdir /user/ubuntu/cloudage_data

echo "This is my snapshot data at cloudage" | hdfs dfs -put -
/user/ubuntu/cloudage_data/data.txt
hdfs dfs -cat /user/ubuntu/cloudage_data/data.txt
hdfs dfsadmin -allowSnapshot /user/ubuntu/cloudage_data
hdfs dfs -createSnapshot /user/ubuntu/cloudage_data snapshot_folder
hdfs dfs -rm -r -skipTrash /user/ubuntu/cloudage_data
hdfs dfs -rm -r -skipTrash /user/ubuntu/cloudage_data/data.txt
hdfs dfs -cat /user/ubuntu/cloudage_data/data.txt
hdfs dfs -rm -r /user/ubuntu/cloudage_data/data.txt
hdfs dfs -ls -R /user/ubuntu/cloudage_data/.snapshot
hdfs dfs -cat
/user/ubuntu/cloudage_data/.snapshot/snapshot_folder/data.txt
hdfs dfs -cp
/user/ubuntu/cloudage_data/.snapshot/snapshot_folder/data.txt
/user/ubuntu/cloudage_data
hdfs dfs -cat /user/ubuntu/cloudage_data/data.txt
hdfs lsSnapshottableDir
hdfs dfsadmin -disallowSnapshot /user/ubuntu/cloudage_data/
YARN & HDFS Admin

hdfs dfs -deleteSnapshot /user/ubuntu/cloudage_data/ snapshot_folder

hdfs dfs -renameSnapshot <path> <oldName> <newName>
hdfs snapshotDiff /user/ubuntu/snapshot s20160820-000747.522 s20160820-
000825.861
hdfs snapshotDiff /user/ubuntu/cloudage_data snapshot_folder1
snapshot_folder2
-------------------------------------------------------------------------
--------------------------------------
Configure Backup Node

<property>
<name>dfs.namenode.backup.address</name>
<value>rm:50100</value>
<description>
The backup node server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>

hdfs dfsadmin -refreshNodes

hdfs getconf -backupNodes
hdfs namenode -backup
hdfs namenode -checkpoint
-------------------------------------------------------------------------
------------------------------------------------------------
If no yarn.include file is specified, all NodeManagers are considered to
be included in the cluster (unless
excluded in
the yarn.exclude file). The yarn.resourcemanager.nodes.include-path and
yarn.resourcemanager.nodes.exclude-path
properties in yarn-site.xml are used to specify the yarn.include and
yarn.exclude files.If no dfs.include file
is specified, all DataNodes are considered to be included in the cluster
(unless excluded in the
dfs.exclude file). The dfs.hosts and dfs.hosts.exlude properties in hdfs-
site.xml are used to specify the
dfs.include
and dfs.exclude files.
-------------------------------------------------------------------------
----------------------------------------------
Decommisioning Node
nano dfs.exclude
nn
hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop/etc/hadoop/dfs.excludes</value>
<final>true</final>
</property>

-------------------------------------------------------------------------
----------------------------------------------------------
yarn-site.xml
<property>
YARN & HDFS Admin

<name>mapred.hosts.exclude</name><value>/usr/local/hadoop/conf/
excludes</value>
<final>true</final>
</property>
hdfs dfsadmin -refreshNodes
yarn rmadmin -refreshNodes
hdfs dfsadmin -report
_________________________________________________________________________
________
__________
Commissioning Nodes

nano dfs.include

ip-172-31-30-159.eu-west-1.compute.intern
update the Nodes in slaves file
Remove the Nodes from exclude file
update /etc/hosts

hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>/usr/local/hadoop/etc/hadoop/includes</value>
<final>true</final>
</property>

mapred-site.xml
<property>
<name>mapred.hosts.include</name>
<value>/usr/local/hadoop/etc/hadoop/dfs.include</value>
<final>true</final>
</property>

hdfs mradmin -refreshNodes

hdfs balancer
hdfs dfsadmin -report > report_aug
hdfs dfsadmin -report

-------------------------------------------------------------------------
------------------------
yarn Administration

submit a hdfs inbuild application.

hdfs jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-

examples-2.7.2.jar pi 4 10000

yarn rmadmin -checkHealth

yarn application -list

yarn application -list -appStates ALL

YARN & HDFS Admin

To get application ID use yarn application -list

yarn application -status application_1459542433815_0002

yarn logs -applicationId application_1459542433815_0002

yarn application -kill application_1459542433815_0002

http://192.168.0.5:19888/ Job History Server

$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

Full List Of Yarn Configuration Properties

http://hdfs.apache.org/docs/current/hdfs-yarn/hdfs-yarn-common/yarn-
default.xml

Example

Each machine in our cluster has 48 GB of RAM.

Some of this RAM should be reserved for Operating System usage.
On each node, we’ll assign 40 GB RAM for YARN to use and
keep 8 GB for the Operating System.
The following property sets the maximum memory YARN can utilize on the
node:

In yarn-site.xml

<name>yarn.nodemanager.resource.memory-mb</name>

<name>yarn.scheduler.minimum-allocation-mb</name>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

In mapred-site.xml:

<name>mapreduce.map.java.opts</name>

<name>mapreduce.reduce.java.opts</name>
YARN & HDFS Admin

<name>mapreduce.map.memory.mb</name>

<name>mapreduce.reduce.memory.mb</name>

export hdfs_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"

export hdfs_CLIENT_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"

export YARN_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"

export YARN_CLIENT_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"

Thus, with the above settings on our example cluster, each Map task will
get the following memory allocations with the following:

Total physical RAM allocated = 4 GB

JVM heap space upper limit within the Map task Container = 3 GB
Virtual memory upper limit = 4*2.1 = 8.2 GB

configuration map task memory is 4GB (mapreduce.map.memory.mb = 4096 )

and reduce task physical memory is 8GB (mapreduce.reduce.memory.mb = 8192
). Node Manager’s physical memory is 40GB and that’s the reason there
will be a maximum of 10 mappers((40/4) and 5 reducers(40/8)
http://docs.hortonworks.com/index.html
https://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

hdfs native

set output.compression.enabled true;

set output.compression.codec org.apache.hdfs.io.compress.BZip2Codec;

inputFiles = LOAD '/input/directory/uncompressed' using PigStorage();

STORE inputFiles INTO '/output/directory/compressed/' USING PigStorage();

You can use different codecs depending on your needs:

YARN & HDFS Admin

set output.compression.codec com.hdfs.compression.lzo.LzopCodec;

set output.compression.codec org.apache.hdfs.io.compress.GzipCodec;
set output.compression.codec org.apache.hdfs.io.compress.BZip2Codec;

-------------------------------------------------------------------------
----------------
https://streamsets.com/documentation/datacollector/latest/help/
#Install_Config/CMInstall-
Overview.html#concept_nb5_c3m_25
sudo yum install wget -y
wget
https://archives.streamsets.com/datacollector/2.5.1.1/rpm/streamsets-
datacollector-2.5.1.1-all-rpms.tgz
tar -xzvf streamsets-datacollector-2.5.1.1-all-rpms.tgz
cd streamsets-datacollector-2.5.1.1-all-rpms
sudo yum localinstall streamsets*.rpm
sudo -i
service sdc start
ulimit -n 32768
http://<system-ip>:18630/

*****************************************
only if your doing it with tar ball
./streamsets-datacollector-2.4.0.0/bin/streamsets stagelibs -list
sudo mkdir /etc/init.d/sdc
sudo groupadd sdc
sudo useradd -g sdc sdc
sudo nano /etc/security/limits.conf
ubuntu soft nofile 4096
_________________________________________________________________________
_____________________----------------------------------

• system file checker

hdfs fsck /user/ubuntu/DataSet.txt -files

hdfs fsck / -locations -blocks -files
hdfs fsck DataSet.txt -locations -blocks -files
hdfs fsck -list-corruptfileblocks
hdfs dfs -rm /user/ubuntu/DataSet.txt (example that this is a corrupt)
hdfs fsck / -delete
hdfs dfsck / > /home/ubuntu/
_________________________________________________________________________
__

Introduction To HDFS
No ratings yet
Introduction To HDFS
21 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
AppLocker Design Guide PDF
No ratings yet
AppLocker Design Guide PDF
82 pages
Hadoop-HDFS-commands
No ratings yet
Hadoop-HDFS-commands
1 page
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
12 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
2 pages
5-Practicas+BigData Trabajar Hdfs
No ratings yet
5-Practicas+BigData Trabajar Hdfs
10 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
20 pages
Introduction_to_HDFS
No ratings yet
Introduction_to_HDFS
18 pages
Running Hadoop On Ubuntu Linux
No ratings yet
Running Hadoop On Ubuntu Linux
15 pages
Scenario_Based_Hadoop_Interview_Questions
No ratings yet
Scenario_Based_Hadoop_Interview_Questions
5 pages
Big Data Class Activity Assignment 2
No ratings yet
Big Data Class Activity Assignment 2
17 pages
Hortonworks Data Platform: HDFS Administration
No ratings yet
Hortonworks Data Platform: HDFS Administration
74 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
Ebs Cli
No ratings yet
Ebs Cli
3 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
C:/Users/HP Hdfs Namenode - Format
No ratings yet
C:/Users/HP Hdfs Namenode - Format
7 pages
Hadoop_HDFS
No ratings yet
Hadoop_HDFS
3 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
5 pages
Unit 2
No ratings yet
Unit 2
22 pages
Module 02 HDFS - Hadoop Distributed File System
No ratings yet
Module 02 HDFS - Hadoop Distributed File System
31 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
3021170
No ratings yet
3021170
51 pages
Lab2_BigData-HDFSp
No ratings yet
Lab2_BigData-HDFSp
4 pages
Command
No ratings yet
Command
1 page
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
Lista de Comandos HDFS
No ratings yet
Lista de Comandos HDFS
8 pages
Importing and Exporting Files in Hadoop Distributed File System
No ratings yet
Importing and Exporting Files in Hadoop Distributed File System
6 pages
Savenamespace
No ratings yet
Savenamespace
2 pages
PDF - HDFS Commandsdsa
No ratings yet
PDF - HDFS Commandsdsa
22 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Hadoop Commands Only
No ratings yet
Hadoop Commands Only
19 pages
Hadoop Installation Cluster
No ratings yet
Hadoop Installation Cluster
9 pages
$ Hdfs Dfsadmin - Report
No ratings yet
$ Hdfs Dfsadmin - Report
7 pages
4 1 UsingHDFSStorage HDFS
No ratings yet
4 1 UsingHDFSStorage HDFS
9 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
Chotu Meat
No ratings yet
Chotu Meat
11 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
13 pages
Hadoop Installation 2
No ratings yet
Hadoop Installation 2
5 pages
Hadoop Linux Commands
No ratings yet
Hadoop Linux Commands
8 pages
Prácticas Bigdata: 1. Administración de Hdfs
No ratings yet
Prácticas Bigdata: 1. Administración de Hdfs
3 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
Yarn Tutorial PDF
No ratings yet
Yarn Tutorial PDF
30 pages
Exp1 Hirday Merged
No ratings yet
Exp1 Hirday Merged
102 pages
COMMAND Line Interface
No ratings yet
COMMAND Line Interface
26 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Ex 3
No ratings yet
Ex 3
3 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
3b. Introduction to Hadoopm Ecosystem_presentation
No ratings yet
3b. Introduction to Hadoopm Ecosystem_presentation
26 pages
HANDS Hadoop Cloud
No ratings yet
HANDS Hadoop Cloud
10 pages
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
EXP 1-2
No ratings yet
EXP 1-2
9 pages
basic HDFS commands
No ratings yet
basic HDFS commands
7 pages
HDFS
No ratings yet
HDFS
18 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
7 pages
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Chatgpt and Excel - Trust, But Verify
No ratings yet
Chatgpt and Excel - Trust, But Verify
15 pages
Slidesgo The Expansive Realm of Python Unlocking Versatile Applications 20240722113602rJO8
No ratings yet
Slidesgo The Expansive Realm of Python Unlocking Versatile Applications 20240722113602rJO8
12 pages
H0060912011 MARCHCert
No ratings yet
H0060912011 MARCHCert
1 page
Nadeem Khan CV
No ratings yet
Nadeem Khan CV
3 pages
Aksha Interview Questions
100% (1)
Aksha Interview Questions
52 pages
Parvez Musharraf CV
No ratings yet
Parvez Musharraf CV
1 page
Nimwadi Type III AAC Block
No ratings yet
Nimwadi Type III AAC Block
11 pages
CONFG
No ratings yet
CONFG
2 pages
Daftar Peraturan Dan Standar Konstruksi: Downloads
No ratings yet
Daftar Peraturan Dan Standar Konstruksi: Downloads
3 pages
Ict WB Answers - c11
No ratings yet
Ict WB Answers - c11
4 pages
Microsoft Visual C++ 2010 x86 Redistributable Setup - 20211119 - 160607358-MSI - VC - Red - Msi
No ratings yet
Microsoft Visual C++ 2010 x86 Redistributable Setup - 20211119 - 160607358-MSI - VC - Red - Msi
30 pages
Add Payload Files Adding Uefi Images
No ratings yet
Add Payload Files Adding Uefi Images
21 pages
Movelvfromvgtovg
No ratings yet
Movelvfromvgtovg
3 pages
Lab 2.2 - Create User Accounts
No ratings yet
Lab 2.2 - Create User Accounts
9 pages
Maildroprc
No ratings yet
Maildroprc
2 pages
Cisco AnyConnect Issues and Work
No ratings yet
Cisco AnyConnect Issues and Work
4 pages
Untitled
No ratings yet
Untitled
2 pages
Linux Command Cheat Sheet
No ratings yet
Linux Command Cheat Sheet
9 pages
HUAWEI AR ROUTERS MANUAL SOFTWARE UPGRADE GUIDE (1)
No ratings yet
HUAWEI AR ROUTERS MANUAL SOFTWARE UPGRADE GUIDE (1)
6 pages
First Boot Log
No ratings yet
First Boot Log
4 pages
Install ROS For Windown10 1. Go To Microsoft Store, Search Ubuntu (Owned)
No ratings yet
Install ROS For Windown10 1. Go To Microsoft Store, Search Ubuntu (Owned)
4 pages
Backing Up Your Network With RANCID
100% (1)
Backing Up Your Network With RANCID
3 pages
Image Upload Delete Update
No ratings yet
Image Upload Delete Update
5 pages
Vi3 Rcli Svmotion
No ratings yet
Vi3 Rcli Svmotion
9 pages
Windows 10 Rules
No ratings yet
Windows 10 Rules
349 pages
ScaleComputing Documentation 6
No ratings yet
ScaleComputing Documentation 6
54 pages
美化整合包使用说明翻译汉化
No ratings yet
美化整合包使用说明翻译汉化
2 pages
Resume Shashi
No ratings yet
Resume Shashi
4 pages
Sai Lab 1
No ratings yet
Sai Lab 1
14 pages
Solaris Containers and ZFS Cheat Sheet
No ratings yet
Solaris Containers and ZFS Cheat Sheet
4 pages
Iodd STLK Quickguide (En)
No ratings yet
Iodd STLK Quickguide (En)
18 pages
How To Install Adobe Software
No ratings yet
How To Install Adobe Software
3 pages
Chmod666 NIM Cheat Sheet
No ratings yet
Chmod666 NIM Cheat Sheet
2 pages
Ubuntu Server Guide
No ratings yet
Ubuntu Server Guide
320 pages
Linux System Administration and Configuration
No ratings yet
Linux System Administration and Configuration
29 pages
Setting Up Cadence For The Linux Machines
No ratings yet
Setting Up Cadence For The Linux Machines
7 pages
Log N 118387
No ratings yet
Log N 118387
1 page

YarnHdfs Administration

Uploaded by

YarnHdfs Administration

Uploaded by

YARN & HDFS Admin

Data Transfer between S3 and HDFS

Hadoop dfsadmin -refreshNodes

hdfs dfs -cp s3n://cloud-age/dataset /user/ubuntu/dataset

hdfs dfs -cp dataset s3n://cloud-age/Pipeline/Asia/DATASET.TXT

hdfs distcp hdfs://ip-10-0-2-95.ec2.internal:9000/user/ubuntu/dataset

hdfs distcp -Dfs.s3n.access.key=AKIAISOKC3T575TZYS4Q -

hdfs distcp -Dfs.s3a.access.key=AKIAISOKC3T575TZYS4Q -

hdfs dfs -ls -R hdfs://nn:9000/user/ubuntu/.Trash/Current

hdfs dfs –cp

hdfs dfs -rm -r -skipTrash /user/ubuntu/StreamSetDataColloctor

hdfs dfs -expunge

hdfs dfs -ls -R hdfs://nn:9000/user/ubuntu/.Trash/Current

---------- remove file from trash

For a production environment, it is recommended that you enable trash to

_____________________ Block Size_______________________

hadoop dfs -put dataset .

hadoop dfs -rm -r /user/ubuntu/*

hadoop dfs -D dfs.blocksize=268435456 -cp /user/ubuntu/dataset

Setting Replication for a dataset

hdfs dfs -setrep 4 dataset Replicate

hdfs dfsadmin -metasave metasave-report.txt

hdfs archive -archiveName positive.har -p /user/ubuntu/datasets

hdfs dfs -ls / ---------to check

hdfs dfs -df -h hdfs:/ --

hdfs storagepolicies -listPolicies

This specifies that each DataNode's disk usage must be within 5% of

hadoop secondarynamenode -checkpoint force

Adding users in ubuntu

hdfs dfs -chmod 600 /user/dfs

hdfs dfs -mkdir /user/ubuntu/testQuota

hdfs dfs -mkdir /user/ubuntu/cloudage_data

hdfs dfs -deleteSnapshot /user/ubuntu/cloudage_data/ snapshot_folder

hdfs dfsadmin -refreshNodes

hdfs mradmin -refreshNodes

submit a hdfs inbuild application.

hdfs jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-

yarn rmadmin -checkHealth

yarn application -list

yarn application -list -appStates ALL

To get application ID use yarn application -list

yarn application -status application_1459542433815_0002

yarn logs -applicationId application_1459542433815_0002

yarn application -kill application_1459542433815_0002

http://192.168.0.5:19888/ Job History Server

$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

Full List Of Yarn Configuration Properties

Each machine in our cluster has 48 GB of RAM.

Total physical RAM allocated = 4 GB

configuration map task memory is 4GB (mapreduce.map.memory.mb = 4096 )

set output.compression.enabled true;

inputFiles = LOAD '/input/directory/uncompressed' using PigStorage();

You can use different codecs depending on your needs:

set output.compression.codec com.hdfs.compression.lzo.LzopCodec;

• system file checker

hdfs fsck /user/ubuntu/DataSet.txt -files

You might also like

_ Block Size___