Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14171

Performance improvement in Tailing EditLog

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.9.0, 3.0.0-alpha1
    • 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1, 2.9.3
    • namenode
    • None

    Description

      Stack:

      Thread 456 (Edit log tailer):
      State: RUNNABLE
      Blocked count: 1139
      Waited count: 12
      Stack:
      org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getNumLiveDataNodes(DatanodeManager.java:1259)
      org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.areThresholdsMet(BlockManagerSafeMode.java:570)
      org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.checkSafeMode(BlockManagerSafeMode.java:213)
      org.apache.hadoop.hdfs.server.blockmanagement.BlockManagerSafeMode.adjustBlockTotals(BlockManagerSafeMode.java:265)
      org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:1087)
      org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:1118)
      org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1126)
      org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:468)
      org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:258)
      org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
      org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:892)
      org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
      org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
      org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
      org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
      org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
      org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
      Thread 455 (pool-16-thread-1):
      
      
      

      code:

      private boolean areThresholdsMet() {
        assert namesystem.hasWriteLock();
        int datanodeNum = blockManager.getDatanodeManager().getNumLiveDataNodes();
        synchronized (this) {
          return blockSafe >= blockThreshold && datanodeNum >= datanodeThreshold;
        }
      }
      

      According to the code, each time the method areThresholdsMet() is called, the value of datanodeNum is need to be calculated.  However, in the scenario of datanodeThreshold is equal to 0(0 is the default value of the configuration), This expression datanodeNum >= datanodeThreshold always returns true.

      Calling the method getNumLiveDataNodes() is time consuming at a scale of 10,000 datanode clusters. Therefore, we add the judgment condition, and only when the datanodeThreshold is greater than 0, the datanodeNum is calculated, which improves the perfomance greatly.

      The Call Tree graph is shown in the attached file.

       

      Attachments

        1. HDFS-14171.003.patch
          2 kB
          Kenneth Yang
        2. HDFS-14171.002.patch
          1 kB
          Kenneth Yang
        3. HDFS-14171.001.patch
          1 kB
          Kenneth Yang
        4. HDFS-14171.000.patch
          1 kB
          Kenneth Yang
        5. HDFS-14171_Call-Tree.png
          372 kB
          Kenneth Yang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kennethlnnn Kenneth Yang
            kennethlnnn Kenneth Yang
            Votes:
            2 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment