hadoop - Hbase managed zookeeper suddenly trying to connect to localhost instead of zookeeper quorum -


i running tests table mappers , reducers on large scale problems. after point reducers started failing when job 80% done. can tell when looking @ syslogs problem 1 of zookeepers attempting connect localhost opposed other zookeepers in quorum

oddly seems fine connecting other nodes when mapping going on, reducing has problem with. here selected portions of syslog might relevant figuring out whats going on

2014-06-27 09:44:01,599 info [main] org.apache.zookeeper.zookeeper: initiating client connection, connectstring=hdev02:5181,hdev01:5181,hdev03:5181 sessiontimeout=10000 watcher=hconnection-0x4aee260b, quorum=hdev02:5181,hdev01:5181,hdev03:5181, baseznode=/hbase 2014-06-27 09:44:01,612 info [main] org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: process identifier=hconnection-0x4aee260b connecting zookeeper ensemble=hdev02:5181,hdev01:5181,hdev03:5181 2014-06-27 09:44:01,614 info [main-sendthread(hdev02:5181)] org.apache.zookeeper.clientcnxn: opening socket connection server hdev02/172.17.43.36:5181. not attempt authenticate using sasl (unable locate login configuration) 2014-06-27 09:44:01,615 info [main-sendthread(hdev02:5181)] org.apache.zookeeper.clientcnxn: socket connection established hdev02/172.17.43.36:5181, initiating session 2014-06-27 09:44:01,617 info [main-sendthread(hdev02:5181)] org.apache.zookeeper.clientcnxn: unable read additional data server sessionid 0x0, server has closed socket, closing socket connection , attempting reconnect 2014-06-27 09:44:01,723 warn [main] org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: possibly transient zookeeper, quorum=hdev02:5181,hdev01:5181,hdev03:5181, exception=org.apache.zookeeper.keeperexception$connectionlossexception: keepererrorcode = connectionloss /hbase/hbaseid 2014-06-27 09:44:01,723 info [main] org.apache.hadoop.hbase.util.retrycounter: sleeping  *** org.apache.hadoop.mapreduce.task.reduce.mergemanagerimpl: finalmerge called 1 in-memory map-outputs , 1 on-disk map-outputs 2014-06-27 09:55:12,012 info [main] org.apache.hadoop.mapred.merger: merging 1 sorted segments 2014-06-27 09:55:12,013 info [main] org.apache.hadoop.mapred.merger: down last merge-pass, 1 segments left of total size: 33206049 bytes 2014-06-27 09:55:12,208 info [main] org.apache.hadoop.mapreduce.task.reduce.mergemanagerimpl: merged 1 segments, 33206079 bytes disk satisfy reduce memory limit 2014-06-27 09:55:12,209 info [main] org.apache.hadoop.mapreduce.task.reduce.mergemanagerimpl: merging 2 files, 265119413 bytes disk 2014-06-27 09:55:12,209 info [main] org.apache.hadoop.mapreduce.task.reduce.mergemanagerimpl: merging 0 segments, 0 bytes memory reduce 2014-06-27 09:55:12,210 info [main] org.apache.hadoop.mapred.merger: merging 2 sorted segments 2014-06-27 09:55:12,212 info [main] org.apache.hadoop.mapred.merger: down last merge-pass, 2 segments left of total size: 265119345 bytes 2014-06-27 09:55:12,279 info [main] org.apache.zookeeper.zookeeper: initiating client connection, connectstring=localhost:2181 sessiontimeout=90000 watcher=hconnection-0x65afdbbb, quorum=localhost:2181, baseznode=/hbase 2014-06-27 09:55:12,281 info [main] org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: process identifier=hconnection-0x65afdbbb connecting zookeeper ensemble=localhost:2181 2014-06-27 09:55:12,282 info [main-sendthread(localhost.localdomain:2181)] org.apache.zookeeper.clientcnxn: opening socket connection server localhost.localdomain/127.0.0.1:2181. not attempt authenticate using sasl (unable locate login configuration) 2014-06-27 09:55:12,283 warn [main-sendthread(localhost.localdomain:2181)] org.apache.zookeeper.clientcnxn: session 0x0 server null, unexpected error, closing socket connection , attempting reconnect java.net.connectexception: connection refused     @ sun.nio.ch.socketchannelimpl.checkconnect(native method)     @ sun.nio.ch.socketchannelimpl.finishconnect(socketchannelimpl.java:599)     @ org.apache.zookeeper.clientcnxnsocketnio.dotransport(clientcnxnsocketnio.java:350)     @ org.apache.zookeeper.clientcnxn$sendthread.run(clientcnxn.java:1068) 2014-06-27 09:55:12,384 warn [main] org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: possibly transient zookeeper, quorum=localhost:2181, exception=org.apache.zookeeper.keeperexception$connectionlossexception: keepererrorcode = connectionloss /hbase/hbaseid 2014-06-27 09:55:12,384 info [main] org.apache.hadoop.hbase.util.retrycounter: sleeping 1000ms before retry #0... 2014-06-27 09:55:13,385 info [main-sendthread(localhost.localdomain:2181)] org.apache.zookeeper.clientcnxn: opening socket connection server localhost.localdomain/127.0.0.1:2181. not attempt authenticate using sasl (unable locate login configuration) 2014-06-27 09:55:13,385 warn [main-sendthread(localhost.localdomain:2181)] org.apache.zookeeper.clientcnxn: session 0x0 server null, unexpected error, closing  *** org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: possibly transient zookeeper, quorum=localhost:2181, exception=org.apache.zookeeper.keeperexception$connectionlossexception: keepererrorcode = connectionloss /hbase/hbaseid 2014-06-27 09:55:13,486 error [main] org.apache.hadoop.hbase.zookeeper.recoverablezookeeper: zookeeper exists failed after 1 attempts 2014-06-27 09:55:13,486 warn [main] org.apache.hadoop.hbase.zookeeper.zkutil: hconnection-0x65afdbbb, quorum=localhost:2181, baseznode=/hbase unable set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.keeperexception$connectionlossexception: keepererrorcode = connectionloss /hbase/hbaseid 

i'm pretty sure configured correctly, here relevant portion of hbase-site.xml.

<property>   <name>hbase.zookeeper.property.clientport</name>   <value>5181</value>   <description>property zookeeper's config zoo.cfg.     port @ clients connect.     </description> </property> <property>   <name>zookeeper.session.timeout</name>   <value>10000</value>   <description></description> </property> <property>   <name>hbase.client.retries.number</name>   <value>10</value>   <description></description> </property> <property>   <name>hbase.zookeeper.quorum</name>   <value>hdev01,hdev02,hdev03</value>   <description></description> </property> 

so far can tell hdev03 server has problem this. netstating relevant ports doesn't show me strange.

i've had same problem when running hbase through spark on yarn. fine until started trying connect localhost instead of quorum. setting port , quorum programmatically before hbase call fixed issue

conf.set("hbase.zookeeper.quorum","my.server") conf.set("hbase.zookeeper.property.clientport","5181") 

i'm using mapr, , has "unusual" (5181) zookeeper port


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -