Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-811

Add metrics, failure reporting and additional tests for HDFS-457

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HDFS-457 introduced a improvement which allows datanode to continue if a volume for replica storage fails. Previously a datanode resigned if any volume failed.

      Description of HDFS-457

      Current implementation shuts DataNode down completely when one of the configured volumes of the storage fails.
      This is rather wasteful behavior because it decreases utilization (good storage becomes unavailable) and imposes extra load on the system (replication of the blocks from the good volumes). These problems will become even more prominent when we move to mixed (heterogeneous) clusters with many more volumes per Data Node.

      I suggest following additional tests for this improvement.

      #1 Test successive volume failures ( Minimum 4 volumes )
      #2 Test if each volume failure reports reduction in available DFS space and remaining space.
      #3 Test if failure of all volumes on a data nodes leads to the data node failure.
      #4 Test if correcting failed storage disk brings updates and increments available DFS space.

      1. hdfs-811-6.patch
        37 kB
        Eli Collins
      2. hdfs-811-5.patch
        37 kB
        Eli Collins
      3. hdfs-811-4.patch
        34 kB
        Eli Collins
      4. hdfs-811-3.patch
        34 kB
        Eli Collins
      5. hdfs-811-2.patch
        26 kB
        Eli Collins
      6. hdfs-811-1.patch
        23 kB
        Eli Collins

        Issue Links

          Activity

          Gavin made changes -
          Link This issue is depended upon by HDFS-556 [ HDFS-556 ]
          Gavin made changes -
          Link This issue blocks HDFS-556 [ HDFS-556 ]
          Eli Collins made changes -
          Link This issue is related to HDFS-1276 [ HDFS-1276 ]
          Eli Collins made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Eli Collins made changes -
          Hadoop Flags [Reviewed]
          Issue Type Test [ 6 ] New Feature [ 2 ]
          Affects Version/s 0.21.0 [ 12314046 ]
          Eli Collins made changes -
          Attachment hdfs-811-6.patch [ 12459540 ]
          Eli Collins made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Eli Collins made changes -
          Attachment hdfs-811-5.patch [ 12459518 ]
          Jakob Homan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Tom White made changes -
          Fix Version/s 0.21.0 [ 12314046 ]
          Eli Collins made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Eli Collins made changes -
          Attachment hdfs-811-4.patch [ 12445732 ]
          Eli Collins made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Eli Collins made changes -
          Summary Additional tests(Unit tests & Functional tests) for HDFS-457. Add metrics, failure reporting and additional tests for HDFS-457
          Eli Collins made changes -
          Attachment hdfs-811-3.patch [ 12445717 ]
          Eli Collins made changes -
          Link This issue blocks HDFS-1161 [ HDFS-1161 ]
          Eli Collins made changes -
          Link This issue is blocked by HDFS-1161 [ HDFS-1161 ]
          Eli Collins made changes -
          Attachment hdfs-811-2.patch [ 12445606 ]
          Eli Collins made changes -
          Fix Version/s 0.21.0 [ 12314046 ]
          Affects Version/s 0.21.0 [ 12314046 ]
          Description
           HDFS-457 introduced a improvement which allows datanode to continue if a volume for replica storage fails. Previously a datanode resigned if any volume failed.

          Description of HDFS-457
          {quote}
          Current implementation shuts DataNode down completely when one of the configured volumes of the storage fails.
          This is rather wasteful behavior because it decreases utilization (good storage becomes unavailable) and imposes extra load on the system (replication of the blocks from the good volumes). These problems will become even more prominent when we move to mixed (heterogeneous) clusters with many more volumes per Data Node.
          {quote}

          I suggest following additional tests for this improvement.

          #1 Test successive volume failures ( Minimum 4 volumes )
          #2 Test if each volume failure reports reduction in available DFS space and remaining space.
          #3 Test if failure of all volumes on a data nodes leads to the data node failure.
          #4 Test if correcting failed storage disk brings updates and increments available DFS space.
           HDFS-457 introduced a improvement which allows datanode to continue if a volume for replica storage fails. Previously a datanode resigned if any volume failed.

          Description of HDFS-457
          {quote}
          Current implementation shuts DataNode down completely when one of the configured volumes of the storage fails.
          This is rather wasteful behavior because it decreases utilization (good storage becomes unavailable) and imposes extra load on the system (replication of the blocks from the good volumes). These problems will become even more prominent when we move to mixed (heterogeneous) clusters with many more volumes per Data Node.
          {quote}

          I suggest following additional tests for this improvement.

          #1 Test successive volume failures ( Minimum 4 volumes )
          #2 Test if each volume failure reports reduction in available DFS space and remaining space.
          #3 Test if failure of all volumes on a data nodes leads to the data node failure.
          #4 Test if correcting failed storage disk brings updates and increments available DFS space.
          Eli Collins made changes -
          Link This issue blocks HDFS-1161 [ HDFS-1161 ]
          Eli Collins made changes -
          Link This issue blocks HDFS-556 [ HDFS-556 ]
          Eli Collins made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Eli Collins made changes -
          Attachment hdfs-811-1.patch [ 12444182 ]
          Eli Collins made changes -
          Assignee Eli Collins [ eli ]
          Ravi Phulari made changes -
          Field Original Value New Value
          Link This issue is related to HDFS-457 [ HDFS-457 ]
          Ravi Phulari created issue -

            People

            • Assignee:
              Eli Collins
              Reporter:
              Ravi Phulari
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development