yarn-site.xml同步更新其他节点的配置信息(configure整个复制粘贴,可对比查看前后)
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--启用resourcemanager ha--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--声明两台resourcemanager的地址--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>bigdata166</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>bigdata167</value> </property> <!--指定zookeeper集群的地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>bigdata166:2181,bigdata167:2181,bigdata168:2181</value> </property> <!--启用自动恢复--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的状态信息存储在zookeeper集群--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>
启动hdfs(蓝色部分如果已经做过并启动了,不需要重复执行)
在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode
在[nn1]上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
在[nn2]上,同步nn1的元数据信息:
bin/hdfs namenode -bootstrapStandby
启动[nn2]:
sbin/hadoop-daemon.sh start namenode
启动所有datanode
sbin/hadoop-daemons.sh start datanode
将[nn1]切换为Active
bin/hdfs haadmin -transitionToActive nn1
启动yarn
在bigdata166中执行:
sbin/start-yarn.sh
在bigdata167中执行:
sbin/yarn-daemon.sh start resourcemanager
查看服务状态
bin/yarn rmadmin -getServiceState rm1
测试:
kill掉166的rm
然后查看rm2的状态就变为active
有个坑:刚开始忘了启动zk,导致zkfc和resoucemanger闪退,看日志报无法连接zk错误。