Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26155

JVM crash when scan

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1
    • 3.0.0-alpha-1, 2.5.0, 2.4.6, 2.3.7
    • Scanners
    • None

    Description

      There are scanner close caused regionserver JVM coredump problems on our production clusters.

      Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free space=1020k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      V  [libjvm.so+0x7fd314]
      J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
      j  org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
      j  org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
      j  org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
      j  org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
      j  org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
      J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
      J 21387 C2 org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
      J 26353 C2 org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
      J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 0x00007fdae959f68c [0x00007fdae959e400+0x128c]
      J 19598% C2 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
      

      There are also scan rpc errors when coredump happens at the handler,

      I found some clue in the logs, that some blocks may be replaced when its nextBlockOnDiskSize less than the newly one in the method 

       

      public static boolean shouldReplaceExistingCacheBlock(BlockCache blockCache,
          BlockCacheKey cacheKey, Cacheable newBlock) {
        if (cacheKey.toString().indexOf(".") != -1) { // reference file
          LOG.warn("replace existing cached block, cache key is : " + cacheKey);
          return true;
        }
        Cacheable existingBlock = blockCache.getBlock(cacheKey, false, false, false);
        if (existingBlock == null) {
          return true;
        }
        try {
          int comparison = BlockCacheUtil.validateBlockAddition(existingBlock, newBlock, cacheKey);
          if (comparison < 0) {
            LOG.warn("Cached block contents differ by nextBlockOnDiskSize, the new block has "
                + "nextBlockOnDiskSize set. Caching new block.");
            return true;
      ......

       

      And the block will be replaced if it is not in the RAMCache but in the BucketCache.

      When using 

       

      private void putIntoBackingMap(BlockCacheKey key, BucketEntry bucketEntry) {
        BucketEntry previousEntry = backingMap.put(key, bucketEntry);
        if (previousEntry != null && previousEntry != bucketEntry) {
          ReentrantReadWriteLock lock = offsetLock.getLock(previousEntry.offset());
          lock.writeLock().lock();
          try {
            blockEvicted(key, previousEntry, false);
          } finally {
            lock.writeLock().unlock();
          }
        }
      }
      

      to replace the old block, to avoid previous bucket entry mem leak, the previous bucket entry will be force released regardless of RPC references to it.

       

      void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry, boolean decrementBlockNumber) {
        bucketAllocator.freeBlock(bucketEntry.offset());
        realCacheSize.add(-1 * bucketEntry.getLength());
        blocksByHFile.remove(cacheKey);
        if (decrementBlockNumber) {
          this.blockNumber.decrement();
        }
      }
      

      I used the check of RPC reference before replace bucket entry, and it works, no coredumps until now.

       

      That is:

      public void cacheBlockWithWait(BlockCacheKey cacheKey, Cacheable cachedItem, boolean inMemory,
          boolean wait) {
        if (cacheEnabled) {
          if (backingMap.containsKey(cacheKey) || ramCache.containsKey(cacheKey)) {
            if (BlockCacheUtil.shouldReplaceExistingCacheBlock(this, cacheKey, cachedItem)) {
              BucketEntry bucketEntry = backingMap.get(cacheKey);
              if (bucketEntry != null && bucketEntry.isRpcRef()) {
                // avoid replace when there are RPC refs for the bucket entry in bucket cache
                return;
              }
              cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait);
            }
          } else {
            cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait);
          }
        }
      }
      

       

      Attachments

        1. scan-error.png
          958 kB
          Xiaolin Ha

        Issue Links

          Activity

            People

              Xiaolin Ha Xiaolin Ha
              Xiaolin Ha Xiaolin Ha
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: