Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16198

Short circuit read leaks Slot objects when InvalidToken exception is thrown

    XMLWordPrintableJSON

    Details

      Description

      In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With this configuration SecretManager.InvalidToken exception may be thrown if the access token expires when we do short circuit reads. It doesn't matter because the failed reads will be retried. But it causes the leakage of ShortCircuitShm.Slot objects. 

       

      We found this problem in our secure HBase clusters. The number of open file descriptors of RegionServers kept increasing using short circuit reading. 

       

      It was caused by the leakage of shared memory segments used by short circuit reading.

      [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | wc -l
      3925
      [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | head -5
      java 86309 hbase DEL REG 0,19 2308279984 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
      java 86309 hbase DEL REG 0,19 2306359893 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
      java 86309 hbase DEL REG 0,19 2305496758 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
      java 86309 hbase DEL REG 0,19 2304784261 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
      java 86309 hbase DEL REG 0,19 2302621988 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 

       

      We finally found that the root cause of this is the leakage of ShortCircuitShm.Slot.

       

      The fix is trivial. Just free the slot when InvalidToken exception is thrown.

        Attachments

        1. screenshot-2.png
          81 kB
          Eungsop Yoo
        2. HDFS-16198.patch
          10 kB
          Eungsop Yoo

          Issue Links

            Activity

              People

              • Assignee:
                Eungsop Yoo Eungsop Yoo
                Reporter:
                Eungsop Yoo Eungsop Yoo
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h