Description
This occurred on a cluster with 46 slaves, running a couple MR jobs. One doing heavy writes copying everything from one table to a new table with a different schema. After one regionserver went down, about 40 of them died within an hour before it was caught and the jobs stopped. Let me know if any other piece of context would be particularly helpful.
This exception appears in the .out file:
Exception in thread "regionserver/192.168.41.2:60020" java.lang.NullPointerException
at org.apache.hadoop.hbase.regionserver.ReadWriteConsistencyControl.getThreadReadPoint(ReadWriteConsistencyControl.java:40)
at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:532)
at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seek(MemStore.java:558)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:320)
at org.apache.hadoop.hbase.regionserver.StoreScanner.checkReseek(StoreScanner.java:306)
at org.apache.hadoop.hbase.regionserver.StoreScanner.peek(StoreScanner.java:143)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:127)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:117)
at java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:644)
at java.util.PriorityQueue.siftDown(PriorityQueue.java:612)
at java.util.PriorityQueue.poll(PriorityQueue.java:523)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.close(KeyValueHeap.java:151)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.close(HRegion.java:1971)
at org.apache.hadoop.hbase.regionserver.HRegionServer.closeAllRegions(HRegionServer.java:1610)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:621)
at java.lang.Thread.run(Thread.java:619)