Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.1.11
-
None
-
None
-
Reviewed
Description
Today, one of our server aborted due to the following log.
2017-06-06 05:38:47,142 ERROR [regionserver/xxxx.logRoller] regionserver.LogRoller: Log rolling failed java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2224) at java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2253) at java.util.Collections.min(Collections.java:628) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findEligibleMemstoresToFlush(FSHLog.java:861) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findRegionsToForceFlush(FSHLog.java:886) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:728) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:137) at java.lang.Thread.run(Thread.java:756) 2017-06-06 05:38:47,142 FATAL [regionserver/xxxx.logRoller] regionserver.HRegionServer: ABORTING region server xxxx: Log rolling failed java.util.NoSuchElementException ......
The code is here:
private byte[][] findEligibleMemstoresToFlush(Map<byte[], Long> regionsSequenceNums) { List<byte[]> regionsToFlush = null; // Keeping the old behavior of iterating unflushedSeqNums under oldestSeqNumsLock. synchronized (regionSequenceIdLock) { for (Map.Entry<byte[], Long> e: regionsSequenceNums.entrySet()) { ConcurrentMap<byte[], Long> m = this.oldestUnflushedStoreSequenceIds.get(e.getKey()); if (m == null) { continue; } long unFlushedVal = Collections.min(m.values()); //The exception is thrown here ......
The map 'm' is empty is the only reason I can think of why NoSuchElementException is thrown. I then looked up all code related to the update of 'oldestUnflushedStoreSequenceIds'. All update to 'oldestUnflushedStoreSequenceIds' is guarded by the synchronization of 'regionSequenceIdLock' except here:
private ConcurrentMap<byte[], Long> getOrCreateOldestUnflushedStoreSequenceIdsOfRegion( byte[] encodedRegionName) { ...... oldestUnflushedStoreSequenceIdsOfRegion = new ConcurrentSkipListMap<byte[], Long>(Bytes.BYTES_COMPARATOR); ConcurrentMap<byte[], Long> alreadyPut = oldestUnflushedStoreSequenceIds.putIfAbsent(encodedRegionName, oldestUnflushedStoreSequenceIdsOfRegion); // Here, a empty map may put to 'oldestUnflushedStoreSequenceIds' with no synchronization return alreadyPut == null ? oldestUnflushedStoreSequenceIdsOfRegion : alreadyPut; }
It should be a very rare bug. But it can lead to server abort. It only exists in branch-1.1.