培训辅导
本人精通大数据比赛相关知识,了解比赛环境,已辅导N+职业学院,欢迎咨询wx:18233661567
环境说明
需要选手在docker容器中搭建hadoop-ha(高可用)模式集群,服务器为centos7,并装有docker,docker中有三个centos7容器,容器名称分别为bigdata1,bigdata2,bigdata3,可直接使用docker exec 命令进入容器
安装流程
该题是我们整个搭建考查的重点也是难点,因为题目中要求namenode-HA和 yarn-HA
模式,所以相关配置项非常多,靠人脑全部记下来费时费力,所以,此题我们采取偷懒的办法!!
在hadoop安装包内是有一份静态的网页文档的,在/opt/module/hadoop-3.1.3/share/doc/hadoop目录下,我们将该文档拷贝到我们的宿主机(比赛中为centos),再从宿主机拷贝到我们的客户端(比赛中为ubuntu),查看文档搭建
- 将相关软件从宿主机拷贝到master容器内,并在master内解压到指定目录
docker cp hadoop-3.1.3.tar.gz master:/opt/software/
docker cp jdk-8u212-linux-x64.tar.gz master:/opt/software/
docker cp apache-zookeeper-3.5.7-bin.tar.gz master:/opt/software/
- 进入到master容器内解压安装包
tar -zxvf /opt/software/hadoop-3.1.3.tar.gz -C /opt/module/
tar -zxvf /opt/software/jdk-8u212-linux-x64.tar.gz -C /opt/module/
tar -zxvf /opt/software/apache-zookeeper-3.5.7-bin.tar.gz -C /opt/module/
- 配置环境变量,搭建zookeeper组件
修改/etc/profile配置各个组件的用户
[root@bigdata1 hadoop]# vim /etc/profile
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
修改/etc/profile配置上面解压的三个组件的环境变量,此步骤省略
zk搭建步骤省略,通过zkServer.sh status 查看zk节点状态为leader或者follower代表搭建成功
- 打开上面提到的离线文档,通过google浏览器打开index.html
根据文档,填写配置文件
[root@bigdata1 /]# cd /opt/module/hadoop-3.1.3/etc/hadoop/
[root@bigdata1 hadoop]# vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<description>
A list of ZooKeeper server addresses, separated by commas, that are
to be used by the ZKFailoverController in automatic failover.
</description>
<value>bigdata1:2181,bigdata2:2181,bigdata3:2181</value>
</property>
</configuration>
[root@bigdata1 hadoop]# vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata1:9868</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
Unique identifiers for each NameNode in the nameservice, delimited by
commas. This will be used by DataNodes to determine all the NameNodes
in the cluster. For example, if you used �~@~\mycluster�~@~] as the nameservice
ID previously, and you wanted to use �~@~\nn1�~@~] and �~@~\nn2�~@~] as the individual
IDs of the NameNodes, you would configure a property
dfs.ha.namenodes.mycluster, and its value "nn1,nn2".
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>bigdata1:8020</value>
<description>
RPC address that handles all clients requests. In the case of HA/Federation where multiple namenodes exist,
the name service id is added to the name e.g. dfs.namenode.rpc-address.ns1
dfs.namenode.rpc-address.EXAMPLENAMESERVICE
The value of this property will take the form of nn-host1:rpc-port. The NameNode's default RPC port is 8020.
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>bigdata2:8020</value>
<description>
RPC address that handles all clients requests. In the case of HA/Federation where multiple namenodes exist,
the name service id is added to the name e.g. dfs.namenode.rpc-address.ns1
dfs.namenode.rpc-address.EXAMPLENAMESERVICE
The value of this property will take the form of nn-host1:rpc-port. The NameNode's default RPC port is 8020.
</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>bigdata1:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>bigdata2:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://bigdata1:8485;bigdata2:8485;bigdata3:8485/mycluster</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/module/hadoop-3.1.3/data/journal/</value>
<description>
The directory where the journal edit files are stored.
</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
</configuration>
[root@bigdata1 hadoop]# vim yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>Enable RM high-availability. When enabled,
(1) The RM starts in the Standby mode by default, and transitions to
the Active mode when prompted to.
(2) The nodes in the RM ensemble are listed in
yarn.resourcemanager.ha.rm-ids
(3) The id of each RM either comes from yarn.resourcemanager.ha.id
if yarn.resourcemanager.ha.id is explicitly specified or can be
figured out by matching yarn.resourcemanager.address.{id} with local address
(4) The actual physical addresses come from the configs of the pattern
- {rpc-config}.{id}</description>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<description>Name of the cluster. In a HA setting,
this is used to ensure the RM participates in leader
election for this cluster and ensures it does not affect
other clusters</description>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<description>The list of RM nodes in the cluster when HA is
enabled. See description of yarn.resourcemanager.ha
.enabled for full details on how this is used.</description>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>bigdata1</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>bigdata2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>bigdata1:2181,bigdata2:2181,bigdata3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
</configuration>
[root@bigdata1 hadoop]# vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
</configuration>
[root@bigdata1 hadoop]# vim workers
bigdata1
bigdata2
bigdata3
- 分发组件
[root@bigdata1 hadoop]# scp -r /opt/module/hadoop-3.1.3 bigdata2:/opt/module/
[root@bigdata1 hadoop]# scp -r /opt/module/hadoop-3.1.3 bigdata3:/opt/module/
- 启动集群
初始化集群
在3个节点上分别启动journalnode服务
[root@bigdata1 /]# hadoop-daemon.sh start journalnode
在我们配置的第一个namenode节点上初始化并启动
[root@bigdata1 /]# hdfs namenode -format
[root@bigdata1 /]# hadoop-daemon.sh start namenode
在另一个namenode节点上执行该命令将其与活动的 NameNode 的元数据同步
[root@bigdata2 /]# hdfs namenode -bootstrapStandby
接下来使用start-all.sh将全部服务启动,并强制激活其中一个namenode节点
[root@bigdata1 /]# hdfs haadmin -transitionToActive --forcemanual nn1
完成之后,可查看下nn1的状态,Active是激活,Standby是未激活
[root@bigdata1 /]# hdfs haadmin -getServiceState nn1
启动zfkc服务,监控NameNode进程, 自动备援
初始化
[root@bigdata1 /]# hdfs zkfc -formatZK
在所有namenode节点上启动服务
[root@bigdata1 /]# hadoop-daemon.sh start zkfc
查看集群状态
查看NN状态
[root@bigdata1 /]# hdfs haadmin -getAllServiceState
查看RM状态
[root@bigdata1 /]# rmadmin -getAllServiceState
- 结果
出现以上服务即可满分