Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1602

NameNode storage failed replica restoration is broken

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.23.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      NameNode storage restore functionality doesn't work (as HDFS-903 demonstrated). This needs to be either disabled, or removed, or fixed. This feature also fails HDFS-1496

      1. HDFS-1602.patch
        1 kB
        Boris Shkolnik
      2. HDFS-1602-1.patch
        3 kB
        Boris Shkolnik
      3. HDFS-1602v22.patch
        1 kB
        Boris Shkolnik

        Issue Links

          Activity

          Hide
          Konstantin Boudnik added a comment -

          I have posted some analysis of HADOOP-4885. It is likely to invalidate this JIRA unless it is totally incorrect

          Show
          Konstantin Boudnik added a comment - I have posted some analysis of HADOOP-4885 . It is likely to invalidate this JIRA unless it is totally incorrect
          Hide
          Boris Shkolnik added a comment -

          I've looked at testStorageRestore failure. Seems like the problem is that when we are trying to restore a storage dir, we format it , which always saves the current in-memory state into a new fsimage. So instead we should restore a storage without storing creating new fsimage. It will be copied from the CheckPoint.
          Here is a patch for trunk to do it. Please review.

          Show
          Boris Shkolnik added a comment - I've looked at testStorageRestore failure. Seems like the problem is that when we are trying to restore a storage dir, we format it , which always saves the current in-memory state into a new fsimage. So instead we should restore a storage without storing creating new fsimage. It will be copied from the CheckPoint. Here is a patch for trunk to do it. Please review.
          Hide
          Konstantin Boudnik added a comment -

          +1 patch seems to be legit and it sure fixes HDFS-1496

          Show
          Konstantin Boudnik added a comment - +1 patch seems to be legit and it sure fixes HDFS-1496
          Hide
          Todd Lipcon added a comment -

          attemptRestoreRemovedStorage is now only ever called with its boolean argument true. Let's remove that argument.

          Aside from that seems to make sense.

          Show
          Todd Lipcon added a comment - attemptRestoreRemovedStorage is now only ever called with its boolean argument true. Let's remove that argument. Aside from that seems to make sense.
          Hide
          Boris Shkolnik added a comment -

          removed parameter for attemptRestoreFailedStorage as suggested by Todd

          Show
          Boris Shkolnik added a comment - removed parameter for attemptRestoreFailedStorage as suggested by Todd
          Hide
          Konstantin Boudnik added a comment -

          +1 pending usual validation or local tests run.

          Show
          Konstantin Boudnik added a comment - +1 pending usual validation or local tests run.
          Hide
          Todd Lipcon added a comment -

          yep, looks good to me also. Thanks a bunch, Boris. With any luck we'll be back to green next week!

          Show
          Todd Lipcon added a comment - yep, looks good to me also. Thanks a bunch, Boris. With any luck we'll be back to green next week!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12470307/HDFS-1602-1.patch
          against trunk revision 1067288.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings).

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.hdfs.server.datanode.TestBlockReport
          org.apache.hadoop.hdfs.TestFileConcurrentReader

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//testReport/
          Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470307/HDFS-1602-1.patch against trunk revision 1067288. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.TestFileConcurrentReader -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//testReport/ Release audit warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/150//console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          included test is testStorageRestore

          I manually ran 'failed' core tests - they all passed.

          Release audit warnings are not relevant to this patch (editsStored.xml)

          Show
          Boris Shkolnik added a comment - included test is testStorageRestore I manually ran 'failed' core tests - they all passed. Release audit warnings are not relevant to this patch (editsStored.xml)
          Hide
          Todd Lipcon added a comment -

          +1

          Show
          Todd Lipcon added a comment - +1
          Hide
          Boris Shkolnik added a comment -

          committed to trunk.

          Show
          Boris Shkolnik added a comment - committed to trunk.
          Hide
          Konstantin Shvachko added a comment -

          This should go at least to 0.22 as the original code was introduced in 0.21.

          Show
          Konstantin Shvachko added a comment - This should go at least to 0.22 as the original code was introduced in 0.21.
          Hide
          Nigel Daley added a comment -

          FWIW, TestBlockRecovery.testErrorReplicas failed (timed out). This is in the same class as the fixed test I think. Search console for failure: https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/537/console

          Re-running build again.

          Show
          Nigel Daley added a comment - FWIW, TestBlockRecovery.testErrorReplicas failed (timed out). This is in the same class as the fixed test I think. Search console for failure: https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/537/console Re-running build again.
          Hide
          Konstantin Boudnik added a comment -

          FWIW, TestBlockRecovery.testErrorReplicas failed (timed out)

          This JIRA is about TestStorageRestore.

          Boris, would you like to backport it to 0.22 at least? The ticket needs to be closed.

          Show
          Konstantin Boudnik added a comment - FWIW, TestBlockRecovery.testErrorReplicas failed (timed out) This JIRA is about TestStorageRestore. Boris, would you like to backport it to 0.22 at least? The ticket needs to be closed.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #539 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/539/ )
          Hide
          Boris Shkolnik added a comment -

          Patch for 0.22

          Show
          Boris Shkolnik added a comment - Patch for 0.22
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12470731/HDFS-1602v22.patch
          against trunk revision 1068968.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/158//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470731/HDFS-1602v22.patch against trunk revision 1068968. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/158//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          +1 on patch for 22

          Show
          Todd Lipcon added a comment - +1 on patch for 22
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #540 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/540/)
          Reordering. Patch for HDFS-1602 is committed into release 0.22

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #540 (See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/540/ ) Reordering. Patch for HDFS-1602 is committed into release 0.22
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #643 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/643/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #35 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-22-branch/35/ )

            People

            • Assignee:
              Boris Shkolnik
              Reporter:
              Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development