Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14383

Compute datanode load based on StoragePolicy

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.3, 3.1.2
    • 3.3.1, 3.4.0
    • hdfs, namenode
    • None
    • Reviewed

    Description

      Datanode load check logic needs to be changed because existing computation will not consider StoragePolicy.

      DatanodeManager#getInServiceXceiverAverage

      
      public double getInServiceXceiverAverage() {
       double avgLoad = 0;
       final int nodes = getNumDatanodesInService();
       if (nodes != 0) {
       final int xceivers = heartbeatManager
       .getInServiceXceiverCount();
       avgLoad = (double)xceivers/nodes;
       }
       return avgLoad;
      }
      
      

       

      For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) with average 10 xceivers the calculated threshold by the NN is 28 (((500 + 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes unavailable when the COLD tier nodes are barely in use. Turning this check off helps to mitigate this issue, however the dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, upon turning it off can lead to situations where specific DNs are "overloaded".

      Attachments

        1. HDFS-14383-01.patch
          16 kB
          Ayush Saxena
        2. HDFS-14383-02.patch
          17 kB
          Ayush Saxena

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ayushtkn Ayush Saxena
            kpalanisamy Karthik Palanisamy
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment