Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18168

NoSuchElementException when rolling the log

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.11
    • 1.1.11
    • None
    • None
    • Reviewed

    Description

      Today, one of our server aborted due to the following log.

      2017-06-06 05:38:47,142 ERROR [regionserver/xxxx.logRoller] regionserver.LogRoller: Log rolling failed
      java.util.NoSuchElementException
              at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2224)
              at java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2253)
              at java.util.Collections.min(Collections.java:628)
              at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findEligibleMemstoresToFlush(FSHLog.java:861)
              at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findRegionsToForceFlush(FSHLog.java:886)
              at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:728)
              at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:137)
              at java.lang.Thread.run(Thread.java:756)
      2017-06-06 05:38:47,142 FATAL [regionserver/xxxx.logRoller] regionserver.HRegionServer: ABORTING region server xxxx: Log rolling failed
      java.util.NoSuchElementException
      ......
      

      The code is here:

      private byte[][] findEligibleMemstoresToFlush(Map<byte[], Long> regionsSequenceNums) {
          List<byte[]> regionsToFlush = null;
          // Keeping the old behavior of iterating unflushedSeqNums under oldestSeqNumsLock.
          synchronized (regionSequenceIdLock) {
            for (Map.Entry<byte[], Long> e: regionsSequenceNums.entrySet()) {
              ConcurrentMap<byte[], Long> m =
                  this.oldestUnflushedStoreSequenceIds.get(e.getKey());
              if (m == null) {
                continue;
              }
              long unFlushedVal = Collections.min(m.values()); //The exception is thrown here
              ......
      

      The map 'm' is empty is the only reason I can think of why NoSuchElementException is thrown. I then looked up all code related to the update of 'oldestUnflushedStoreSequenceIds'. All update to 'oldestUnflushedStoreSequenceIds' is guarded by the synchronization of 'regionSequenceIdLock' except here:

      private ConcurrentMap<byte[], Long> getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(
            byte[] encodedRegionName) {
          ......
          oldestUnflushedStoreSequenceIdsOfRegion =
              new ConcurrentSkipListMap<byte[], Long>(Bytes.BYTES_COMPARATOR);
          ConcurrentMap<byte[], Long> alreadyPut =
              oldestUnflushedStoreSequenceIds.putIfAbsent(encodedRegionName,
                oldestUnflushedStoreSequenceIdsOfRegion); // Here, a empty map may put to 'oldestUnflushedStoreSequenceIds' with no synchronization
          return alreadyPut == null ? oldestUnflushedStoreSequenceIdsOfRegion : alreadyPut;
        }
      

      It should be a very rare bug. But it can lead to server abort. It only exists in branch-1.1.

      Attachments

        1. HBASE-18168-branch-1.1.v3.patch
          0.8 kB
          Allan Yang
        2. HBASE-18168-branch-1.1.v2.patch
          0.8 kB
          Allan Yang
        3. HBASE-18168-branch-1.1.patch
          0.7 kB
          Allan Yang

        Activity

          People

            allan163 Allan Yang
            allan163 Allan Yang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: