Details
Description
I don't reproduce this all the time, but I had it on a fairly clean env.
It occurs every 5 minutes (i.e. the balancer period). Impact is severe: the balancer does not run. When it starts to occurs, it occurs all the time. I haven't tried to restart the master, but I think it should be enough.
Now, looking at the code, the NPE is strange.
2013-04-18 08:09:52,079 ERROR [box,60000,1366281581983-BalancerChore] org.apache.hadoop.hbase.master.balancer.BalancerChore: Caught exception java.lang.NullPointerException at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:145) at org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer.balanceCluster(StochasticLoadBalancer.java:194) at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1295) at org.apache.hadoop.hbase.master.balancer.BalancerChore.chore(BalancerChore.java:48) at org.apache.hadoop.hbase.Chore.run(Chore.java:81) at java.lang.Thread.run(Thread.java:662) 2013-04-18 08:09:52,103 DEBUG [box,60000,1366281581983-CatalogJanitor] org.apache.hadoop.hbase.client.ClientScanner: Creating scanner over .META. starting at key ''
if (regionFinder != null) { //region location List<ServerName> loc = regionFinder.getTopBlockLocations(region); regionLocations[regionIndex] = new int[loc.size()]; for (int i=0; i < loc.size(); i++) { regionLocations[regionIndex][i] = serversToIndex.get(loc.get(i)); // <========= NPE here } }
pinging enis, just in case.