Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16198

Short circuit read leaks Slot objects when InvalidToken exception is thrown

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With this configuration SecretManager.InvalidToken exception may be thrown if the access token expires when we do short circuit reads. It doesn't matter because the failed reads will be retried. But it causes the leakage of ShortCircuitShm.Slot objects. 

       

      We found this problem in our secure HBase clusters. The number of open file descriptors of RegionServers kept increasing using short circuit reading. 

       

      It was caused by the leakage of shared memory segments used by short circuit reading.

      [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | wc -l
      3925
      [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk '{print $2}') | grep /dev/shm | head -5
      java 86309 hbase DEL REG 0,19 2308279984 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
      java 86309 hbase DEL REG 0,19 2306359893 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
      java 86309 hbase DEL REG 0,19 2305496758 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
      java 86309 hbase DEL REG 0,19 2304784261 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
      java 86309 hbase DEL REG 0,19 2302621988 /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 

       

      We finally found that the root cause of this is the leakage of ShortCircuitShm.Slot.

       

      The fix is trivial. Just free the slot when InvalidToken exception is thrown.

      Attachments

        1. HDFS-16198.patch
          10 kB
          Eungsop Yoo
        2. screenshot-2.png
          81 kB
          Eungsop Yoo

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Eungsop Yoo Eungsop Yoo
            Eungsop Yoo Eungsop Yoo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2.5h
                2.5h

                Issue deployment