HMaster每天都自动挂掉,求大神指点 10C

最近遇到一个比较头疼的问题,HBase每天都会自动挂掉一次,时间大概在5:30-5:45之间,做了几种尝试
1. 检查host配置。
2. 检查时钟同步。
3. 设置会话超时时间为60s

#####HMaster的出错日志如下:#####

 2015-09-21 05:32:20,463 INFO  [main-SendThread(132.37.5.197:29184)] zookeeper.ClientCnxn: Socket connection established to 132.37.5.197/132.37.5.197:29184, initiating session
2015-09-21 05:32:20,465 FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []
2015-09-21 05:32:20,465 INFO  [main-SendThread(132.37.5.197:29184)] zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x24f1f7bb79103a9 has expired, closing socket connection
2015-09-21 05:32:20,465 FATAL [main-EventThread] master.HMaster: master:60900-0x24f1f7bb79103a9, quorum=132.37.5.196:29184,132.37.5.195:29184,132.37.5.197:29184, baseZNode=/hbase master:60900-0x24f1f7bb79103a9 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:417)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:328)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2015-09-21 05:32:20,466 INFO  [main-EventThread] regionserver.HRegionServer: STOPPED: master:60900-0x24f1f7bb79103a9, quorum=132.37.5.196:29184,132.37.5.195:29184,132.37.5.197:29184, baseZNode=/hbase master:60900-0x24f1f7bb79103a9 received expired from ZooKeeper, aborting
2015-09-21 05:32:20,466 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-09-21 05:32:20,466 INFO  [master/pkgtstdb2/132.37.5.194:60900] regionserver.HRegionServer: Stopping infoServer
2015-09-21 05:32:20,468 INFO  [master/pkgtstdb2/132.37.5.194:60900] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60910
2015-09-21 05:32:20,570 INFO  [master/pkgtstdb2/132.37.5.194:60900] regionserver.HRegionServer: stopping server pkgtstdb2,60900,1442548707194
2015-09-21 05:32:20,570 INFO  [master/pkgtstdb2/132.37.5.194:60900] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2015-09-21 05:32:20,570 INFO  [master/pkgtstdb2/132.37.5.194:60900] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x24f1f7bb79103ad
2015-09-21 05:32:20,572 INFO  [master/pkgtstdb2/132.37.5.194:60900] zookeeper.ZooKeeper: Session: 0x24f1f7bb79103ad closed
2015-09-21 05:32:20,573 INFO  [master/pkgtstdb2/132.37.5.194:60900-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-09-21 05:32:20,573 INFO  [master/pkgtstdb2/132.37.5.194:60900] regionserver.HRegionServer: stopping server pkgtstdb2,60900,1442548707194; all regions closed.
2015-09-21 05:32:20,573 INFO  [CatalogJanitor-pkgtstdb2:60900] master.CatalogJanitor: CatalogJanitor-pkgtstdb2:60900 exiting
2015-09-21 05:32:20,574 WARN  [master/pkgtstdb2/132.37.5.194:60900] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=132.37.5.196:29184,132.37.5.195:29184,132.37.5.197:29184, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
2015-09-21 05:32:20,574 INFO  [pkgtstdb2:60900.oldLogCleaner] cleaner.LogCleaner: pkgtstdb2:60900.oldLogCleaner exiting

HBase的配置文件如下:

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://gxuweg3tst2:8920/wa</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60900</value>
<description>The port the HBase Master should bind to.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false for standalone mode and true for distributed mode. If
false, startup will run all HBase and ZooKeeper daemons together
in the one JVM.
</description>
</property>
<property>
<name>hbase.tmp.dir</name>
<!-- <value>/tmp/hbase-${user.name}</value> -->
<value>/uniiof/users/devdpp01/hbase/tmp</value>
<description>Temporary directory on the local filesystem.
Change this setting to point to a location more permanent
than '/tmp' (The '/tmp' directory is often cleared on
machine restart).
</description>
</property>
<property>
<name>hbase.master.info.port</name>
<value>60910</value>
<description>The port for the HBase Master web UI.
Set to -1 if you do not want a UI instance run.
</description>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>60920</value>
<description>The port the HBase RegionServer binds to.
</description>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>60930</value>
<description>The port for the HBase RegionServer web UI
Set to -1 if you do not want the RegionServer UI to run.
</description>
</property>
<!--
          The following three properties are used together to create the list of
               host:peer_port:leader_port quorum servers for ZooKeeper.
                    -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>132.37.5.195,132.37.5.196,132.37.5.197</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.peerport</name>
<value>29888</value>
<description>Port used by ZooKeeper peers to talk to each other.
See
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZoo
Keeper
for more information.
</description>
</property>
<property>
<name>hbase.zookeeper.leaderport</name>
<value>39888</value>
<description>Port used by ZooKeeper for leader election.
See
http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZoo
Keeper
for more information.
</description>
</property>
<!-- End of properties used to generate ZooKeeper host:port quorum list. -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>29184</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<!-- End of properties that are directly mapped from ZooKeeper's zoo.cfg -->
<property>
<name>hbase.rest.port</name>
<value>8980</value>
<description>The port for the HBase REST server.</description>
</property>
</configuration>

3个回答

最关键的错误时:FATAL [main-EventThread] master.HMaster: Master server abort: loaded coprocessors are: []

这个问题你后来怎么解决的呀

我后来也曾遇到这个问题,主要原因是HBase在GC的过程中长时间停顿,导致响应失败。建议遇到此错误的人先查看gc日志,然后再分析问题。可参看文章https://blog.csdn.net/liu16659/article/details/82430396

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问