Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
0.90.0
-
None
-
None
-
None
Description
while testing I was having problems with my master aborting early on, which causes trouble with the regionservers... they are SUPPOSED to wait forever for the master to come up, but they eventually 'give up' without saying anything helpful. For example this was in the log:
2011-01-27 17:27:25,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry
2011-01-27 17:27:28,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry
2011-01-27 17:27:31,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry
2011-01-27 17:27:34,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry
2011-01-27 17:27:37,913 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry
2011-01-27 17:28:37,593 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.26 MB, free=393.42 MB, max=396.68 MB, blocks=1, accesses=69, hits=64, hitRatio=92.75%%, cachingAccesses=65, cachingHits=64, cachingHitsRatio=98.46%%, evictions=0, evicted=0, evictedPerRun=NaN
then nothing else. It had been well over 3 minutes at this point. jstacking the process shows lots of threads running, but the process is effectively dead and only kill -9 will get rid of it.