YarnHdfs Administration
YarnHdfs Administration
wget https://s3.amazonaws.com/cloud-age/dataset
nano /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>AKIAJOWRK4NLW77YYOEA</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>2WF8+mHf8wYQaB9NghCpRdh2dvADh5dfVasNxua1</value>
</property>
cd /usr/local/hadoop/share/hadoop/tools/lib/
export
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/tools/lib/*
echo $HADOOP_CLASSPATH
=================================Distcp==================================
=============
-------------------------------------------------------------------------
-----------------
• Trash configuration
nano /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.trash.interval</name>
YARN & HDFS Admin
<value>2</value>
<description>Number of minutes between trash checkpoints. If zero, the
trash feature is
disabled</description>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>1</value>
</property>
This command causes the NameNode to permanently delete files from the
trash that are older than the threshold, instead of waiting for the next
emptier window. It immediately removes expired checkpoints from the file
system.
Keep in mind that when trash is enabled and you remove some files, HDFS
capacity does not increase because files are not truly deleted. The HDFS
does not reclaim the space unless the files are removed from the trash,
which occurs only after checkpoints are expired. Sometimes you might want
to temporarily disable trash when deleting files; in this case, you can
run the rm command with the -skipTrash option.
YARN & HDFS Admin
<property>
<name>dfs.block.size</name>
<value>524288</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
hdfs version
------------------------------------------------------
dfs.storage.policy.enabled
https://hortonworks.com/blog/heterogeneous-storage-policies-hdp-2-2/
---------------------------------------------------------------------
dfs.datanode.failed.volumes.tolerated
----------------------------------------------------
hdfs balancer -threshold 10
hdfs mover
A new data migration tool is added for archiving data. The tool is
similar to Balancer. It periodically scans the files in HDFS to check if
the block placement satisfies the storage policy. For the blocks
violating the storage policy, it moves the replicas to a different
storage type in order to fulfill the storage policy requirement.
--------------------------------------------
cd /usr/local/hadoop/data/hadoop/namenode/current
cat seen_txid
hdfs dfsadmin -rollEdits
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
Save current namespace into storage directories and reset edits log
hdfs dfsadmin -fetchImage /home/ubuntu/hdfsbackup.2017
hdfs dfsadmin -restoreFailedStorage check
hdfs dfsadmin -restoreFailedStorage true
------------------------------------------------
YARN & HDFS Admin
Setting permissions
set Quota
======================================================================
CREATE SNAPSHOTS
<property>
<name>dfs.namenode.backup.address</name>
<value>rm:50100</value>
<description>
The backup node server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
-------------------------------------------------------------------------
----------------------------------------------------------
yarn-site.xml
<property>
YARN & HDFS Admin
<name>mapred.hosts.exclude</name><value>/usr/local/hadoop/conf/
excludes</value>
<final>true</final>
</property>
hdfs dfsadmin -refreshNodes
yarn rmadmin -refreshNodes
hdfs dfsadmin -report
_________________________________________________________________________
________
__________
Commissioning Nodes
nano dfs.include
ip-172-31-30-159.eu-west-1.compute.intern
update the Nodes in slaves file
Remove the Nodes from exclude file
update /etc/hosts
hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value>/usr/local/hadoop/etc/hadoop/includes</value>
<final>true</final>
</property>
mapred-site.xml
<property>
<name>mapred.hosts.include</name>
<value>/usr/local/hadoop/etc/hadoop/dfs.include</value>
<final>true</final>
</property>
-------------------------------------------------------------------------
------------------------
yarn Administration
Example
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
In mapred-site.xml:
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
YARN & HDFS Admin
<value>-Xmx6144m</value>
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
export hdfs_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"
export hdfs_CLIENT_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"
export YARN_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"
export YARN_CLIENT_OPTS="-Dmapreduce.map.memory.mb=2000 -
Dmapreduce.map.java.opts=-Xmx1500m"
Thus, with the above settings on our example cluster, each Map task will
get the following memory allocations with the following:
hdfs native
-------------------------------------------------------------------------
----------------
https://streamsets.com/documentation/datacollector/latest/help/
#Install_Config/CMInstall-
Overview.html#concept_nb5_c3m_25
sudo yum install wget -y
wget
https://archives.streamsets.com/datacollector/2.5.1.1/rpm/streamsets-
datacollector-2.5.1.1-all-rpms.tgz
tar -xzvf streamsets-datacollector-2.5.1.1-all-rpms.tgz
cd streamsets-datacollector-2.5.1.1-all-rpms
sudo yum localinstall streamsets*.rpm
sudo -i
service sdc start
ulimit -n 32768
http://<system-ip>:18630/
*****************************************
only if your doing it with tar ball
./streamsets-datacollector-2.4.0.0/bin/streamsets stagelibs -list
sudo mkdir /etc/init.d/sdc
sudo groupadd sdc
sudo useradd -g sdc sdc
sudo nano /etc/security/limits.conf
ubuntu soft nofile 4096
_________________________________________________________________________
_____________________----------------------------------