Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12701

More fine-grained locks in ShortCircuitCache

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-beta1, 2.8.1
    • None
    • None
    • None

    Description

      When cluster is heavily loaded and short-circuit read is enabled, we found HBase regionserver handlers are often blocked by ShortCircuitCache. Dumped jstack and found more lots of thread waiting on obtain the cache lock. It should be able to be improved by using more fine-grained locks to improve the performance.

      Attachments

        Issue Links

          Activity

            cheersyang Weiwei Yang added a comment - - edited

            In a jstack dump, there are more than 50 threads waiting on ShortCircuitCache#unref()

            RW.default.readRpcServer.handler=318,queue=21,port=16020" #369 daemon prio=5 os_prio=0 tid=0x00007fcf0814f000 nid=0x1bbf5 waiting on condition [0x00007fce93cf9000]
               java.lang.Thread.State: WAITING (parking)
                    at sun.misc.Unsafe.park(Native Method)
                    - parking to wait for  <0x00007fd18d7c1300> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
                    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
                    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
                    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
                    at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.unref(ShortCircuitCache.java:415)
                    at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.unref(ShortCircuitReplica.java:143)
                    at org.apache.hadoop.hdfs.client.impl.BlockReaderLocal.close(BlockReaderLocal.java:627)
                    - locked <0x00007fcf89ac37b8> (a org.apache.hadoop.hdfs.client.impl.BlockReaderLocal)
                    at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1328)
                    at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1245)
                    at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1206)
                    at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1579)
                    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1543)
                    at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1455)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1616)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1495)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1453)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:336)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:816)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.reseekTo(HFileReaderImpl.java:797)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:263)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:381)
                    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:377)
                    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
                    at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:613)
            

            and more than 60 threads blocked waiting on ShortCircuitCache#fetchOrCreate()

            "RW.default.readRpcServer.handler=316,queue=19,port=16020" #367 daemon prio=5 os_prio=0 tid=0x00007fcf0814d000 nid=0x1bbf3 waiting on condition [0x00007fce93d7b000]
               java.lang.Thread.State: WAITING (parking)
                    at sun.misc.Unsafe.park(Native Method)
                    - parking to wait for  <0x00007fd18d7c1300> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
                    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
                    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
                    at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
                    at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
                    at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:677)
                    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:486)
                    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:367)
                    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:720)
                    at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1282)
                    at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1245)
                    at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1206)
                    at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1579)
                    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1543)
                    at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1455)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1616)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1495)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1453)
                    at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:336)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:816)
                    at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.reseekTo(HFileReaderImpl.java:797)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:263)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180)
                    at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:381)
                    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:377)
                    at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
                    at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:613)
            
            cheersyang Weiwei Yang added a comment - - edited In a jstack dump, there are more than 50 threads waiting on ShortCircuitCache#unref() RW. default .readRpcServer.handler=318,queue=21,port=16020" #369 daemon prio=5 os_prio=0 tid=0x00007fcf0814f000 nid=0x1bbf5 waiting on condition [0x00007fce93cf9000] java.lang. Thread .State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fd18d7c1300> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.unref(ShortCircuitCache.java:415) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.unref(ShortCircuitReplica.java:143) at org.apache.hadoop.hdfs.client.impl.BlockReaderLocal.close(BlockReaderLocal.java:627) - locked <0x00007fcf89ac37b8> (a org.apache.hadoop.hdfs.client.impl.BlockReaderLocal) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1328) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1245) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1206) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1579) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1543) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1455) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1616) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1495) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1453) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:336) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:816) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.reseekTo(HFileReaderImpl.java:797) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:263) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:381) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:377) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:613) and more than 60 threads blocked waiting on ShortCircuitCache#fetchOrCreate() "RW. default .readRpcServer.handler=316,queue=19,port=16020" #367 daemon prio=5 os_prio=0 tid=0x00007fcf0814d000 nid=0x1bbf3 waiting on condition [0x00007fce93d7b000] java.lang. Thread .State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00007fd18d7c1300> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:677) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:486) at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:367) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:720) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1282) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1245) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1206) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1579) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1543) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1455) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1616) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1495) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1453) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:336) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:816) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.reseekTo(HFileReaderImpl.java:797) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:263) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:180) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:381) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:377) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:613)

            People

              Unassigned Unassigned
              cheersyang Weiwei Yang
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: