Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9236

Missing sanity check for block size during block recovery

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Ran into an issue while running test against faulty data-node code.

      Currently in DataNode.java:

        /** Block synchronization */
        void syncBlock(RecoveringBlock rBlock,
                               List<BlockRecord> syncList) throws IOException {
      …
      
          // Calculate the best available replica state.
          ReplicaState bestState = ReplicaState.RWR;
      …
      
          // Calculate list of nodes that will participate in the recovery
          // and the new block size
          List<BlockRecord> participatingList = new ArrayList<BlockRecord>();
          final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId,
              -1, recoveryId);
          switch(bestState) {
      …
          case RBW:
          case RWR:
            long minLength = Long.MAX_VALUE;
            for(BlockRecord r : syncList) {
              ReplicaState rState = r.rInfo.getOriginalReplicaState();
              if(rState == bestState) {
                minLength = Math.min(minLength, r.rInfo.getNumBytes());
                participatingList.add(r);
              }
            }
            newBlock.setNumBytes(minLength);
            break;
      …
          }
      …
          nn.commitBlockSynchronization(block,
              newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false,
              datanodes, storages);
        }
      

      This code is called by the DN coordinating the block recovery. In the above case, it is possible for none of the rState (reported by DNs with copies of the replica being recovered) to match the bestState. This can either be caused by faulty DN code or stale/modified/corrupted files on DN. When this happens the DN will end up reporting the minLengh of Long.MAX_VALUE.

      Unfortunately there is no check on the NN for replica length. See FSNamesystem.java:

        void commitBlockSynchronization(ExtendedBlock oldBlock,
            long newgenerationstamp, long newlength,
            boolean closeFile, boolean deleteblock, DatanodeID[] newtargets,
            String[] newtargetstorages) throws IOException {
      …
      
            if (deleteblock) {
              Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock);
              boolean remove = iFile.removeLastBlock(blockToDel) != null;
              if (remove) {
                blockManager.removeBlock(storedBlock);
              }
            } else {
              // update last block
              if(!copyTruncate) {
                storedBlock.setGenerationStamp(newgenerationstamp);
                
                //>>>> XXX block length is updated without any check <<<<//
                storedBlock.setNumBytes(newlength);
              }
      …
          if (closeFile) {
            LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock
                + ", file=" + src
                + (copyTruncate ? ", newBlock=" + truncatedBlock
                    : ", newgenerationstamp=" + newgenerationstamp)
                + ", newlength=" + newlength
                + ", newtargets=" + Arrays.asList(newtargets) + ") successful");
          } else {
            LOG.info("commitBlockSynchronization(" + oldBlock + ") successful");
          }
        }
      

      After this point the block length becomes Long.MAX_VALUE. Any subsequent block report (even with correct length) will cause the block to be marked as corrupted. Since this is block could be the last block of the file. If this happens and the client goes away, NN won’t be able to recover the lease and close the file because the last block is under-replicated.

      I believe we need to have a sanity check for block size on both DN and NN to prevent such case from happening.

      1. HDFS-9236.001.patch
        8 kB
        Tony Wu
      2. HDFS-9236.002.patch
        5 kB
        Tony Wu
      3. HDFS-9236.003.patch
        5 kB
        Tony Wu
      4. HDFS-9236.004.patch
        5 kB
        Tony Wu
      5. HDFS-9236.005.patch
        5 kB
        Tony Wu
      6. HDFS-9236.006.patch
        7 kB
        Tony Wu
      7. HDFS-9236.007.patch
        8 kB
        Tony Wu

        Activity

        Hide
        twu Tony Wu added a comment -

        First patch.

        Show
        twu Tony Wu added a comment - First patch.
        Hide
        twu Tony Wu added a comment -

        The path does:

        • Add replica length check in syncBlock() so DN reports error instead of sending Long.MAX_VALUE to NN.
        • Add replica length check on NN so it won't blindly update the replica length to a value larger than configured block size.
        • Add extra debug logs to help trace the block recovery process.
        • Add unit tests to verify the new exceptions.

        I tested the patch with:

        • org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
        • org.apache.hadoop.hdfs.server.namenode.TestCommitBlockSynchronization
        • org.apache.hadoop.hdfs.TestLeaseRecovery
        • org.apache.hadoop.hdfs.TestLeaseRecovery2
        • org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover: This, especially the test case testPipelineRecoveryStress is a good system test that stresses all parts in the lease/block recovery code path.
        Show
        twu Tony Wu added a comment - The path does: Add replica length check in syncBlock() so DN reports error instead of sending Long.MAX_VALUE to NN. Add replica length check on NN so it won't blindly update the replica length to a value larger than configured block size. Add extra debug logs to help trace the block recovery process. Add unit tests to verify the new exceptions. I tested the patch with: org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery org.apache.hadoop.hdfs.server.namenode.TestCommitBlockSynchronization org.apache.hadoop.hdfs.TestLeaseRecovery org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover: This, especially the test case testPipelineRecoveryStress is a good system test that stresses all parts in the lease/block recovery code path.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 20m 36s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
        +1 javac 9m 13s There were no new javac warning messages.
        +1 javadoc 11m 44s There were no new javadoc warning messages.
        +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 1m 35s The applied patch generated 3 new checkstyle issues (total was 391, now 390).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 36s mvn install still works.
        +1 eclipse:eclipse 0m 41s The patch built with eclipse:eclipse.
        +1 findbugs 2m 48s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 38s Pre-build of native portion
        -1 hdfs tests 65m 19s Tests failed in hadoop-hdfs.
            117m 40s  



        Reason Tests
        Failed unit tests hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
          hadoop.hdfs.web.TestWebHDFSForHA
          hadoop.hdfs.web.TestFSMainOperationsWebHdfs
          hadoop.hdfs.web.TestWebHDFS



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12766356/HDFS-9236.001.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / da16c9b
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/testrun_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12957/testReport/
        Java 1.7.0_55
        uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12957/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 20m 36s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 9m 13s There were no new javac warning messages. +1 javadoc 11m 44s There were no new javadoc warning messages. +1 release audit 0m 25s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 35s The applied patch generated 3 new checkstyle issues (total was 391, now 390). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 41s The patch built with eclipse:eclipse. +1 findbugs 2m 48s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 38s Pre-build of native portion -1 hdfs tests 65m 19s Tests failed in hadoop-hdfs.     117m 40s   Reason Tests Failed unit tests hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks   hadoop.hdfs.web.TestWebHDFSForHA   hadoop.hdfs.web.TestFSMainOperationsWebHdfs   hadoop.hdfs.web.TestWebHDFS Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766356/HDFS-9236.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / da16c9b Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12957/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12957/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12957/console This message was automatically generated.
        Hide
        twu Tony Wu added a comment -

        All tests pass when manually run on OSX and Linux (CentOS 6.4) with latest trunk. It looks like the failures are not caused by this patch.

        checkstyle seems to be complaining about file & functions being too long. They are also not caused by this patch.

        Show
        twu Tony Wu added a comment - All tests pass when manually run on OSX and Linux (CentOS 6.4) with latest trunk. It looks like the failures are not caused by this patch. checkstyle seems to be complaining about file & functions being too long. They are also not caused by this patch.
        Hide
        twu Tony Wu added a comment -

        Thanks to Yongjun Zhang for offline review and valuable comments! In summary:

        • It is difficult come up with a block size limit to enforce on NN. Especially when considering HDFS allows different files to specify their own block size.
          • I will remove the NN side change in the next patch. I would still like to investigate if we can enforce a per file block size check.
        • The sanity check on DN is useful although the chance of hitting the error in a production cluster is small.
        Show
        twu Tony Wu added a comment - Thanks to Yongjun Zhang for offline review and valuable comments! In summary: It is difficult come up with a block size limit to enforce on NN. Especially when considering HDFS allows different files to specify their own block size. I will remove the NN side change in the next patch. I would still like to investigate if we can enforce a per file block size check. The sanity check on DN is useful although the chance of hitting the error in a production cluster is small.
        Hide
        twu Tony Wu added a comment -

        Removed block size check on NN.

        Show
        twu Tony Wu added a comment - Removed block size check on NN.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Tony Wu,

        Thanks for reporting the finding out the cause of the issue and the patch.

        Besides what we discussed, some additional comments, all cosmetic:

        1.
        About

              LOG.info("syncBlock for block " + block + ", all data-nodes don't have "
                  + "block or their replicas have 0 length. The block cam be deleted.");
        

        Change "data-node" to datanode, add "the" and fix a typo "cam":

              LOG.info("syncBlock for block " + block + ", all datanodes don't have the"
                  + " block or their replicas have 0 length. The block can be deleted.");
        

        BTW, should this be debug message or info? It seems to be helpful to be info, but
        I will leave it to you.

        2. Change the "data-node" to "datanode" and add header "syncBlock replicaInfo: " to
        the new debug messages in syncBlock method.

        3. Add block info to the exception
        throw new IOException("No replica is in the best expected state " + ...

        4. Change "DN triggering" to "Datanode triggering", to be consistent.

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Tony Wu , Thanks for reporting the finding out the cause of the issue and the patch. Besides what we discussed, some additional comments, all cosmetic: 1. About LOG.info( "syncBlock for block " + block + ", all data-nodes don't have " + "block or their replicas have 0 length. The block cam be deleted." ); Change "data-node" to datanode, add "the" and fix a typo "cam": LOG.info( "syncBlock for block " + block + ", all datanodes don't have the" + " block or their replicas have 0 length. The block can be deleted." ); BTW, should this be debug message or info? It seems to be helpful to be info, but I will leave it to you. 2. Change the "data-node" to "datanode" and add header "syncBlock replicaInfo: " to the new debug messages in syncBlock method. 3. Add block info to the exception throw new IOException("No replica is in the best expected state " + ... 4. Change "DN triggering" to "Datanode triggering", to be consistent. Thanks.
        Hide
        twu Tony Wu added a comment -

        Hi Yongjun Zhang,

        Thanks a lot for looking at the patch. Regarding your comments:
        1: This is already been addressed in patch 2.
        2 - 4: I will address this in the next patch.

        Regards,
        Tony

        Show
        twu Tony Wu added a comment - Hi Yongjun Zhang , Thanks a lot for looking at the patch. Regarding your comments: 1: This is already been addressed in patch 2. 2 - 4: I will address this in the next patch. Regards, Tony
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 20m 34s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
        +1 javac 8m 40s There were no new javac warning messages.
        +1 javadoc 11m 3s There were no new javadoc warning messages.
        -1 release audit 0m 19s The applied patch generated 1 release audit warnings.
        -1 checkstyle 1m 32s The applied patch generated 1 new checkstyle issues (total was 142, now 141).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 36s mvn install still works.
        +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
        +1 findbugs 2m 43s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 37s Pre-build of native portion
        -1 hdfs tests 65m 37s Tests failed in hadoop-hdfs.
            116m 20s  



        Reason Tests
        Failed unit tests hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
          hadoop.hdfs.server.namenode.TestStartup
          hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
          hadoop.hdfs.server.namenode.ha.TestDNFencing
          hadoop.hdfs.server.namenode.TestSaveNamespace
          hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold
          hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
          hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
          hadoop.hdfs.util.TestByteArrayManager
          hadoop.hdfs.server.namenode.TestHDFSConcat
          hadoop.hdfs.qjournal.client.TestQuorumJournalManager
          hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
          hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12766810/HDFS-9236.002.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / dc45a7a
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
        Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/patchReleaseAuditProblems.txt
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/testrun_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13005/testReport/
        Java 1.7.0_55
        uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13005/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 20m 34s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 8m 40s There were no new javac warning messages. +1 javadoc 11m 3s There were no new javadoc warning messages. -1 release audit 0m 19s The applied patch generated 1 release audit warnings. -1 checkstyle 1m 32s The applied patch generated 1 new checkstyle issues (total was 142, now 141). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 36s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 2m 43s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 37s Pre-build of native portion -1 hdfs tests 65m 37s Tests failed in hadoop-hdfs.     116m 20s   Reason Tests Failed unit tests hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate   hadoop.hdfs.server.namenode.TestStartup   hadoop.hdfs.server.datanode.TestReadOnlySharedStorage   hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.server.namenode.TestSaveNamespace   hadoop.hdfs.server.datanode.TestDnRespectsBlockReportSplitThreshold   hadoop.hdfs.server.namenode.TestListCorruptFileBlocks   hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade   hadoop.hdfs.util.TestByteArrayManager   hadoop.hdfs.server.namenode.TestHDFSConcat   hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks   hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766810/HDFS-9236.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / dc45a7a Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13005/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13005/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13005/console This message was automatically generated.
        Hide
        twu Tony Wu added a comment -

        Addressed Yongjun Zhang's review comments.

        Show
        twu Tony Wu added a comment - Addressed Yongjun Zhang 's review comments.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Tony Wu,

        Thanks for the updated rev 3 which looks reasonable to me.

        Hi Kihwal Lee, would you please help taking a look? really appreciate it.

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Tony Wu , Thanks for the updated rev 3 which looks reasonable to me. Hi Kihwal Lee , would you please help taking a look? really appreciate it. Thanks.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 18m 43s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
        +1 javac 8m 12s There were no new javac warning messages.
        +1 javadoc 10m 35s There were no new javadoc warning messages.
        +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 1m 28s The applied patch generated 2 new checkstyle issues (total was 142, now 142).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 31s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 2m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 native 3m 15s Pre-build of native portion
        +1 hdfs tests 49m 37s Tests passed in hadoop-hdfs.
            96m 54s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12766845/HDFS-9236.003.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 8d2d3eb
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
        checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
        hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/testrun_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13008/testReport/
        Java 1.7.0_55
        uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13008/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 18m 43s Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 8m 12s There were no new javac warning messages. +1 javadoc 10m 35s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 1m 28s The applied patch generated 2 new checkstyle issues (total was 142, now 142). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 31s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 2m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 15s Pre-build of native portion +1 hdfs tests 49m 37s Tests passed in hadoop-hdfs.     96m 54s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12766845/HDFS-9236.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 8d2d3eb Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/13008/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13008/testReport/ Java 1.7.0_55 uname Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13008/console This message was automatically generated.
        Hide
        twu Tony Wu added a comment -

        checksyle and pre-patch error are not related to this patch.

        Show
        twu Tony Wu added a comment - checksyle and pre-patch error are not related to this patch.
        Hide
        twu Tony Wu added a comment -

        Hi Yongjun Zhang,

        Could you take another look at the updated patch?

        Thanks,
        Tony

        Show
        twu Tony Wu added a comment - Hi Yongjun Zhang , Could you take another look at the updated patch? Thanks, Tony
        Hide
        yzhangal Yongjun Zhang added a comment -

        Sorry for the delay Tony Wu. I did a review and I'm +1 on rev 003, will commit soon.

        Show
        yzhangal Yongjun Zhang added a comment - Sorry for the delay Tony Wu . I did a review and I'm +1 on rev 003, will commit soon.
        Hide
        twu Tony Wu added a comment -

        Yongjun Zhang Thanks a lot for looking at the patch.

        Show
        twu Tony Wu added a comment - Yongjun Zhang Thanks a lot for looking at the patch.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Sorry Tony Wu, the patch no longer applies because of other commits. would you please update the patch? thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Sorry Tony Wu , the patch no longer applies because of other commits. would you please update the patch? thanks.
        Hide
        twu Tony Wu added a comment -

        Hi Yongjun Zhang, I believe HDFS-9255 has moved block recovery related code to a different location. I will rebase my patch and upload a new one shortly.

        Show
        twu Tony Wu added a comment - Hi Yongjun Zhang , I believe HDFS-9255 has moved block recovery related code to a different location. I will rebase my patch and upload a new one shortly.
        Hide
        twu Tony Wu added a comment -

        In v4 patch:

        • Rebased to latest trunk (moved the changes to new file BlockRecoveryWorker.java).
        Show
        twu Tony Wu added a comment - In v4 patch: Rebased to latest trunk (moved the changes to new file BlockRecoveryWorker.java ).
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks Tony Wu, +1 on rev4 pending jenkins.

        Show
        yzhangal Yongjun Zhang added a comment - Thanks Tony Wu , +1 on rev4 pending jenkins.
        Hide
        liuml07 Mingliang Liu added a comment -

        The latest patch looks good to me overall. One minor comment: is it possible to assert expected exception thrown (e.g. by error message) ?

        Show
        liuml07 Mingliang Liu added a comment - The latest patch looks good to me overall. One minor comment: is it possible to assert expected exception thrown (e.g. by error message) ?
        Hide
        twu Tony Wu added a comment -

        Hi Mingliang Liu,

        Thanks a lot for your comment. I debated about having an assert as well and think it has a few disadvantages (please correct me if I'm wrong):

        1. Assert can be disabled at runtime.
        2. Assert message is only visible on DN where the exception can propagate back to NN (and also visible on DN).
        3. Assert would have stopped the DN process, which seems to be an overkill.

        Given these reasons I think throwing an exception is the better choice.

        Tony

        Show
        twu Tony Wu added a comment - Hi Mingliang Liu , Thanks a lot for your comment. I debated about having an assert as well and think it has a few disadvantages (please correct me if I'm wrong): Assert can be disabled at runtime. Assert message is only visible on DN where the exception can propagate back to NN (and also visible on DN). Assert would have stopped the DN process, which seems to be an overkill. Given these reasons I think throwing an exception is the better choice. Tony
        Hide
        liuml07 Mingliang Liu added a comment -

        Sorry for the confusion.

        By "assert expected exception thrown (e.g. by error message)", I mean asserTrue(ioe.getMessage().contains("ooxx")); in test, not in the DN code. I'm with you. I believe throwing an exception is correct and assert is wrong in this case.

        Show
        liuml07 Mingliang Liu added a comment - Sorry for the confusion. By "assert expected exception thrown (e.g. by error message)", I mean asserTrue(ioe.getMessage().contains("ooxx")); in test, not in the DN code. I'm with you. I believe throwing an exception is correct and assert is wrong in this case.
        Hide
        twu Tony Wu added a comment -

        Thanks for clarifying. I'll post a updated patch shortly.

        Show
        twu Tony Wu added a comment - Thanks for clarifying. I'll post a updated patch shortly.
        Hide
        twu Tony Wu added a comment -

        In v5 patch:

        Show
        twu Tony Wu added a comment - In v5 patch: Addressed Mingliang Liu 's comment by updating the test case.
        Hide
        liuml07 Mingliang Liu added a comment -

        +1 (non-binding) pending on Jenkins.

        Show
        liuml07 Mingliang Liu added a comment - +1 (non-binding) pending on Jenkins.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 5s docker + precommit patch detected.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 2m 58s trunk passed
        +1 compile 0m 32s trunk passed with JDK v1.8.0_60
        +1 compile 0m 31s trunk passed with JDK v1.7.0_79
        +1 checkstyle 0m 14s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        -1 findbugs 1m 48s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs
        +1 javadoc 1m 5s trunk passed with JDK v1.8.0_60
        +1 javadoc 1m 51s trunk passed with JDK v1.7.0_79
        +1 mvninstall 0m 39s the patch passed
        +1 compile 0m 31s the patch passed with JDK v1.8.0_60
        +1 javac 0m 31s the patch passed
        +1 compile 0m 30s the patch passed with JDK v1.7.0_79
        +1 javac 0m 30s the patch passed
        +1 checkstyle 0m 15s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 0s the patch passed
        +1 javadoc 1m 5s the patch passed with JDK v1.8.0_60
        +1 javadoc 1m 48s the patch passed with JDK v1.7.0_79
        -1 unit 52m 10s hadoop-hdfs in the patch failed with JDK v1.8.0_60.
        -1 unit 49m 14s hadoop-hdfs in the patch failed with JDK v1.7.0_79.
        -1 asflicense 0m 19s Patch generated 58 ASF License warnings.
        120m 33s



        Reason Tests
        JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure



        Subsystem Report/Notes
        Docker Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-10-29
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12769622/HDFS-9236.004.patch
        JIRA Issue HDFS-9236
        Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile
        uname Linux 7bcaa73db0fe 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-c3a2069/precommit/personality/hadoop.sh
        git revision trunk / c293c58
        Default Java 1.7.0_79
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_60 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79
        findbugs v3.0.0
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13284/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 226MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13284/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 5s docker + precommit patch detected. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 2m 58s trunk passed +1 compile 0m 32s trunk passed with JDK v1.8.0_60 +1 compile 0m 31s trunk passed with JDK v1.7.0_79 +1 checkstyle 0m 14s trunk passed +1 mvneclipse 0m 14s trunk passed -1 findbugs 1m 48s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs +1 javadoc 1m 5s trunk passed with JDK v1.8.0_60 +1 javadoc 1m 51s trunk passed with JDK v1.7.0_79 +1 mvninstall 0m 39s the patch passed +1 compile 0m 31s the patch passed with JDK v1.8.0_60 +1 javac 0m 31s the patch passed +1 compile 0m 30s the patch passed with JDK v1.7.0_79 +1 javac 0m 30s the patch passed +1 checkstyle 0m 15s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 0s the patch passed +1 javadoc 1m 5s the patch passed with JDK v1.8.0_60 +1 javadoc 1m 48s the patch passed with JDK v1.7.0_79 -1 unit 52m 10s hadoop-hdfs in the patch failed with JDK v1.8.0_60. -1 unit 49m 14s hadoop-hdfs in the patch failed with JDK v1.7.0_79. -1 asflicense 0m 19s Patch generated 58 ASF License warnings. 120m 33s Reason Tests JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure Subsystem Report/Notes Docker Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-10-29 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12769622/HDFS-9236.004.patch JIRA Issue HDFS-9236 Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile uname Linux 7bcaa73db0fe 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-c3a2069/precommit/personality/hadoop.sh git revision trunk / c293c58 Default Java 1.7.0_79 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_60 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13284/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13284/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 226MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13284/console This message was automatically generated.
        Hide
        walter.k.su Walter Su added a comment -

        Please hold on the patch. I doubt if it can happen in real case.
        If a DN has a RUR, it will return RecoveryInProgressException. Then the primary DN abort the recovery.

        Show
        walter.k.su Walter Su added a comment - Please hold on the patch. I doubt if it can happen in real case. If a DN has a RUR, it will return RecoveryInProgressException . Then the primary DN abort the recovery.
        Hide
        zhz Zhe Zhang added a comment -

        If a DN has a RUR, it will return RecoveryInProgressException.

        Walter Su A quick comment that DN could run an older version of HDFS than NN. And unknown DN bugs could violate the above assumption as well. Similar to what we saw on HDFS-9289.

        Show
        zhz Zhe Zhang added a comment - If a DN has a RUR, it will return RecoveryInProgressException. Walter Su A quick comment that DN could run an older version of HDFS than NN. And unknown DN bugs could violate the above assumption as well. Similar to what we saw on HDFS-9289 .
        Hide
        walter.k.su Walter Su added a comment -

        If a buggy DN does return RUR without throwing RecoveryInProgressException, please put the checking in the recover() after callInitReplicaRecovery().

        Show
        walter.k.su Walter Su added a comment - If a buggy DN does return RUR without throwing RecoveryInProgressException , please put the checking in the recover() after callInitReplicaRecovery() .
        Hide
        twu Tony Wu added a comment -

        Thanks a lot for Walter Su and Zhe Zhang's comments!

        Walter Su, DN throws RecoveryInProgressException only when the received recovery ID is smaller than the existing RUR recovery ID:

          static ReplicaRecoveryInfo initReplicaRecovery(String bpid, ReplicaMap map,
              Block block, long recoveryId, long xceiverStopTimeout) throws IOException {
            ...
            final ReplicaUnderRecovery rur;
            if (replica.getState() == ReplicaState.RUR) {
              rur = (ReplicaUnderRecovery)replica;
              if (rur.getRecoveryID() >= recoveryId) {
                throw new RecoveryInProgressException(
                    "rur.getRecoveryID() >= recoveryId = " + recoveryId
                    + ", block=" + block + ", rur=" + rur);
              }
              final long oldRecoveryID = rur.getRecoveryID();
              rur.setRecoveryID(recoveryId);
              LOG.info("initReplicaRecovery: update recovery id for " + block
                  + " from " + oldRecoveryID + " to " + recoveryId);
            }
        }
        

        So if the DN has a block that is already in RUR, and a new block recovery starts (with larger recovery ID), the DN does not throw RecoveryInProgressException.

        The patch is focused on what happens after this point, where a buggy DN (or a unknown corner case causes DN) might report RUR as the replica's original state.

        I think your suggestion of moving to check out of syncBlock() and into initReplicaRecovery() make sense. I implemented a check to simply exclude the replicas whose original state is >= RUR (they won't be used for recovery anyways). But the issue with this is that we might end up with an empty syncList and incorrectly tell NN to drop this block. I think the current place for the check in the patch is probably the safest. Please let me know what you think.

        Again thanks a lot for taking the time to look at my patch.

        Show
        twu Tony Wu added a comment - Thanks a lot for Walter Su and Zhe Zhang 's comments! Walter Su , DN throws RecoveryInProgressException only when the received recovery ID is smaller than the existing RUR recovery ID: static ReplicaRecoveryInfo initReplicaRecovery( String bpid, ReplicaMap map, Block block, long recoveryId, long xceiverStopTimeout) throws IOException { ... final ReplicaUnderRecovery rur; if (replica.getState() == ReplicaState.RUR) { rur = (ReplicaUnderRecovery)replica; if (rur.getRecoveryID() >= recoveryId) { throw new RecoveryInProgressException( "rur.getRecoveryID() >= recoveryId = " + recoveryId + ", block=" + block + ", rur=" + rur); } final long oldRecoveryID = rur.getRecoveryID(); rur.setRecoveryID(recoveryId); LOG.info( "initReplicaRecovery: update recovery id for " + block + " from " + oldRecoveryID + " to " + recoveryId); } } So if the DN has a block that is already in RUR, and a new block recovery starts (with larger recovery ID), the DN does not throw RecoveryInProgressException . The patch is focused on what happens after this point, where a buggy DN (or a unknown corner case causes DN) might report RUR as the replica's original state. I think your suggestion of moving to check out of syncBlock() and into initReplicaRecovery() make sense. I implemented a check to simply exclude the replicas whose original state is >= RUR (they won't be used for recovery anyways). But the issue with this is that we might end up with an empty syncList and incorrectly tell NN to drop this block. I think the current place for the check in the patch is probably the safest. Please let me know what you think. Again thanks a lot for taking the time to look at my patch.
        Hide
        walter.k.su Walter Su added a comment -

        I agree with Zhe Zhang that a buggy DN could cause this issue.

        And I agree with you said. Thanks for digging into the code. My mistake. I revise what I said:
        If a DN has a RUR, it will return RecoveryInProgressException.
        If a replica is RUR, a DN returns RecoveryInProgressException or its original state.

        The thing is, it never returns RUR. It's weird syncBlock() assume there is a RUR in syncList.
        It's just I prefer to keep the role of syncBlock simple as the javadoc said "Block synchronization.".
        recover() does only one thing, checking, so it's a better place. If you insist, let's check the arguments at the top of syncBlock().

        Show
        walter.k.su Walter Su added a comment - I agree with Zhe Zhang that a buggy DN could cause this issue. And I agree with you said. Thanks for digging into the code. My mistake. I revise what I said: If a DN has a RUR, it will return RecoveryInProgressException. If a replica is RUR, a DN returns RecoveryInProgressException or its original state. The thing is, it never returns RUR. It's weird syncBlock() assume there is a RUR in syncList. It's just I prefer to keep the role of syncBlock simple as the javadoc said "Block synchronization.". recover() does only one thing, checking, so it's a better place. If you insist, let's check the arguments at the top of syncBlock() .
        Hide
        walter.k.su Walter Su added a comment -

        syncBlock already has an assumption that there's no RUR in syncList. bestState starts with RWR. bestState can't be RUR.

              // Calculate the best available replica state.
              ReplicaState bestState = ReplicaState.RWR;
              for (BlockRecord r : syncList) {
                if (rState.getValue() < bestState.getValue()) {
                  bestState = rState;
                }
        

        It's weird we add one more assumption that there is a RUR in syncList.

        Show
        walter.k.su Walter Su added a comment - syncBlock already has an assumption that there's no RUR in syncList. bestState starts with RWR. bestState can't be RUR . // Calculate the best available replica state. ReplicaState bestState = ReplicaState.RWR; for (BlockRecord r : syncList) { if (rState.getValue() < bestState.getValue()) { bestState = rState; } It's weird we add one more assumption that there is a RUR in syncList.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Walter Su,

        Thanks for the comments.

        Agree that bestState won't be RUR, but seems to me that it doesn't mean there may not be RUR in the syncList. Especially RUR may exist as a bug situation as discussed, and the fix is to try to issue a good message when there is bug. So we actually can not assume there is no RUR in the syncList. Right? Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Walter Su , Thanks for the comments. Agree that bestState won't be RUR , but seems to me that it doesn't mean there may not be RUR in the syncList. Especially RUR may exist as a bug situation as discussed, and the fix is to try to issue a good message when there is bug. So we actually can not assume there is no RUR in the syncList. Right? Thanks.
        Hide
        walter.k.su Walter Su added a comment -

        I mean RUR shouldn't be put in syncList.

        Show
        walter.k.su Walter Su added a comment - I mean RUR shouldn't be put in syncList.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks Walter Su, that makes sense.

        Show
        yzhangal Yongjun Zhang added a comment - Thanks Walter Su , that makes sense.
        Hide
        twu Tony Wu added a comment -

        Thanks Walter Su and Yongjun Zhang for your comments. I'll post a new patch which will exclude RURs from syncList.

        Show
        twu Tony Wu added a comment - Thanks Walter Su and Yongjun Zhang for your comments. I'll post a new patch which will exclude RURs from syncList.
        Hide
        twu Tony Wu added a comment -

        In v6 patch:

        • Address Walter Su's comment by excluding RUR replicas from syncList. syncBlock() now will work on a clean syncList containing only replicas that will be used for recovery.
        • Converted the check in previous patches for Long.MAX_VALUE to an assert.
        • Reworked the test case.
        • Add some comments.
        Show
        twu Tony Wu added a comment - In v6 patch: Address Walter Su 's comment by excluding RUR replicas from syncList. syncBlock() now will work on a clean syncList containing only replicas that will be used for recovery. Converted the check in previous patches for Long.MAX_VALUE to an assert. Reworked the test case. Add some comments.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Moving out all non-critical / non-blocker issues that didn't make it out of 2.7.2 into 2.7.3.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Moving out all non-critical / non-blocker issues that didn't make it out of 2.7.2 into 2.7.3.
        Hide
        yzhangal Yongjun Zhang added a comment -

        HI Tony Wu,

        Thanks for the new rev. Some nits.

        a. Suggest to change

                  if (info != null &&
                      info.getGenerationStamp() >= block.getGenerationStamp() &&
                      info.getNumBytes() > 0) {
                    // Count the number of valid replicas received.
                    ++validReplicaCnt;
                    if (info.getOriginalReplicaState().getValue() <=
                        ReplicaState.RWR.getValue()) {
                      syncList.add(new BlockRecord(id, proxyDN, info));
                    }
                  }
        

        to:

                  if (info == null) {
                   continue;
                  }
                  // Count the number of candidate replicas received.
                  ++candidateReplicaCnt;
                  if (info.getGenerationStamp() >= block.getGenerationStamp() &&
                      info.getNumBytes() > 0 &&
                      info.getOriginalReplicaState().getValue() <=
                        ReplicaState.RWR.getValue()) {
                      syncList.add(new BlockRecord(id, proxyDN, info));
                 } else {
                      //debug message about this replica, to indicate reason of not being chosen
                      LOG.debug(...);
                 }
             }
        

        That is:
        1. change validReplicaCnt to candidateReplicaCnt
        2. consolidate the condition checking
        3. add an "else" branch in the code, and log a debug message in the "else" branch.

        b. Then modify the following change accordingly.

              if (validReplicaCnt > 0 && syncList.isEmpty()) {
                throw new IOException("No replica for block " + block +
                    " is in " + ReplicaState.RWR.name() + " or better state");
              }
        

        to

              if (syncList.isEmpty()) {
                throw new IOException("Found " + candidateReplicaCnt + " replicas for "
                    + block + " from (" + Arrays.asList(locs) + "). No replica met the requirements: "
                    + " 1. validate generation timestap; " +
                    + " 2. non-zero length"
                    + " 3. original state is RWR or better");
              }
        

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - HI Tony Wu , Thanks for the new rev. Some nits. a. Suggest to change if (info != null && info.getGenerationStamp() >= block.getGenerationStamp() && info.getNumBytes() > 0) { // Count the number of valid replicas received. ++validReplicaCnt; if (info.getOriginalReplicaState().getValue() <= ReplicaState.RWR.getValue()) { syncList.add( new BlockRecord(id, proxyDN, info)); } } to: if (info == null ) { continue ; } // Count the number of candidate replicas received. ++candidateReplicaCnt; if (info.getGenerationStamp() >= block.getGenerationStamp() && info.getNumBytes() > 0 && info.getOriginalReplicaState().getValue() <= ReplicaState.RWR.getValue()) { syncList.add( new BlockRecord(id, proxyDN, info)); } else { //debug message about this replica, to indicate reason of not being chosen LOG.debug(...); } } That is: 1. change validReplicaCnt to candidateReplicaCnt 2. consolidate the condition checking 3. add an "else" branch in the code, and log a debug message in the "else" branch. b. Then modify the following change accordingly. if (validReplicaCnt > 0 && syncList.isEmpty()) { throw new IOException( "No replica for block " + block + " is in " + ReplicaState.RWR.name() + " or better state" ); } to if (syncList.isEmpty()) { throw new IOException( "Found " + candidateReplicaCnt + " replicas for " + block + " from (" + Arrays.asList(locs) + "). No replica met the requirements: " + " 1. validate generation timestap; " + + " 2. non-zero length" + " 3. original state is RWR or better" ); } Thanks.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 7s docker + precommit patch detected.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 3m 10s trunk passed
        +1 compile 0m 34s trunk passed with JDK v1.8.0_66
        +1 compile 0m 33s trunk passed with JDK v1.7.0_79
        +1 checkstyle 0m 17s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        -1 findbugs 1m 59s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs
        +1 javadoc 1m 13s trunk passed with JDK v1.8.0_66
        +1 javadoc 2m 1s trunk passed with JDK v1.7.0_79
        +1 mvninstall 0m 41s the patch passed
        +1 compile 0m 33s the patch passed with JDK v1.8.0_66
        +1 javac 0m 33s the patch passed
        +1 compile 0m 34s the patch passed with JDK v1.7.0_79
        +1 javac 0m 34s the patch passed
        +1 checkstyle 0m 16s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 13s the patch passed
        +1 javadoc 1m 12s the patch passed with JDK v1.8.0_66
        +1 javadoc 1m 51s the patch passed with JDK v1.7.0_79
        -1 unit 65m 46s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        -1 unit 75m 43s hadoop-hdfs in the patch failed with JDK v1.7.0_79.
        -1 asflicense 0m 25s Patch generated 56 ASF License warnings.
        162m 19s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
          hadoop.hdfs.server.namenode.ha.TestDNFencing
          hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
        JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing
          hadoop.hdfs.TestRecoverStripedFile
          hadoop.hdfs.TestReadStripedFileWithDecoding
          hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
          hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
          hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes



        Subsystem Report/Notes
        Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-04
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12770451/HDFS-9236.006.patch
        JIRA Issue HDFS-9236
        Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile
        uname Linux 93973b5919f8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-d0f6847/precommit/personality/hadoop.sh
        git revision trunk / 194251c
        Default Java 1.7.0_79
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79
        findbugs v3.0.0
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13370/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 226MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13370/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 7s docker + precommit patch detected. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 10s trunk passed +1 compile 0m 34s trunk passed with JDK v1.8.0_66 +1 compile 0m 33s trunk passed with JDK v1.7.0_79 +1 checkstyle 0m 17s trunk passed +1 mvneclipse 0m 14s trunk passed -1 findbugs 1m 59s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs +1 javadoc 1m 13s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 1s trunk passed with JDK v1.7.0_79 +1 mvninstall 0m 41s the patch passed +1 compile 0m 33s the patch passed with JDK v1.8.0_66 +1 javac 0m 33s the patch passed +1 compile 0m 34s the patch passed with JDK v1.7.0_79 +1 javac 0m 34s the patch passed +1 checkstyle 0m 16s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 13s the patch passed +1 javadoc 1m 12s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 51s the patch passed with JDK v1.7.0_79 -1 unit 65m 46s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 75m 43s hadoop-hdfs in the patch failed with JDK v1.7.0_79. -1 asflicense 0m 25s Patch generated 56 ASF License warnings. 162m 19s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.TestReadStripedFileWithDecoding   hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes Subsystem Report/Notes Docker Client=1.7.0 Server=1.7.0 Image:test-patch-base-hadoop-date2015-11-04 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12770451/HDFS-9236.006.patch JIRA Issue HDFS-9236 Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile uname Linux 93973b5919f8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-d0f6847/precommit/personality/hadoop.sh git revision trunk / 194251c Default Java 1.7.0_79 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13370/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13370/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 226MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13370/console This message was automatically generated.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks Tony Wu for the offline discussion. Consolidating the condition checking seems not quite right.

        We can largely do what your last rev does, with some change (along the line of my last review):

        1. instead of validReplciaCnt, use candidateReplicaCnt
        2. add debug log about the replicas filtered out

                  if (info != null) {
                     continue;
                  }
                  if (info.getGenerationStamp() < block.getGenerationStamp() ||
                      info.getNumBytes() <= 0) {
                    if (LOG.isDebugEnabled()) {
                      LOG.debug(...);
                    }
                    continue;
                  }
                  // Count the number of candidate replicas found.
                  ++candidateStateCnt;
                  if (info.getOriginalReplicaState().getValue() <=
                        ReplicaState.RWR.getValue()) {
                    syncList.add(new BlockRecord(id, proxyDN, info));
                  } else {
                    if (LOG.isDebugEnabled()) {
                      LOG.debug(...);
                    }
                  }
        

        and

              // None of the replicas reported by DataNodes has the required original
              // state, report the error.
              if (candidateReplicaCnt > 0 && syncList.isEmpty()) {
                throw new IOException("Found " + candidateReplicaCnt +
                    " replica(s) for block " + block + " but no one is in " +
                    ReplicaState.RWR.name() + " or better state." +
                    " datanodeids=" + Arrays.asList(locs));
              }
        
        Show
        yzhangal Yongjun Zhang added a comment - Thanks Tony Wu for the offline discussion. Consolidating the condition checking seems not quite right. We can largely do what your last rev does, with some change (along the line of my last review): 1. instead of validReplciaCnt, use candidateReplicaCnt 2. add debug log about the replicas filtered out if (info != null ) { continue ; } if (info.getGenerationStamp() < block.getGenerationStamp() || info.getNumBytes() <= 0) { if (LOG.isDebugEnabled()) { LOG.debug(...); } continue ; } // Count the number of candidate replicas found. ++candidateStateCnt; if (info.getOriginalReplicaState().getValue() <= ReplicaState.RWR.getValue()) { syncList.add( new BlockRecord(id, proxyDN, info)); } else { if (LOG.isDebugEnabled()) { LOG.debug(...); } } and // None of the replicas reported by DataNodes has the required original // state, report the error. if (candidateReplicaCnt > 0 && syncList.isEmpty()) { throw new IOException( "Found " + candidateReplicaCnt + " replica(s) for block " + block + " but no one is in " + ReplicaState.RWR.name() + " or better state." + " datanodeids=" + Arrays.asList(locs)); }
        Hide
        twu Tony Wu added a comment -

        Thanks a lot Yongjun Zhang for your comments. I incorporated them into the new patch.
        I added the debug logs but kept the positive logic for determining which replica info to add to syncList in existing code/patch. IMO the positive logic is easier to read/understand.

        Show
        twu Tony Wu added a comment - Thanks a lot Yongjun Zhang for your comments. I incorporated them into the new patch. I added the debug logs but kept the positive logic for determining which replica info to add to syncList in existing code/patch. IMO the positive logic is easier to read/understand.
        Hide
        twu Tony Wu added a comment -

        In v7 patch:

        • Addressed Yongjun Zhang's review comments.
        • Update the test case.
        • Add a toString() method to pretty print ReplicaRecoveryInfo.
        Show
        twu Tony Wu added a comment - In v7 patch: Addressed Yongjun Zhang 's review comments. Update the test case. Add a toString() method to pretty print ReplicaRecoveryInfo .
        Hide
        walter.k.su Walter Su added a comment -

        The logic looks good to me. Thanks Tony Wu for updating and Yongjun Zhang for review.
        There are many isDebugEnabled() guard. We can consider move to slf4j style. Well, that's not related to this jira.

        Show
        walter.k.su Walter Su added a comment - The logic looks good to me. Thanks Tony Wu for updating and Yongjun Zhang for review. There are many isDebugEnabled() guard. We can consider move to slf4j style. Well, that's not related to this jira.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks Tony Wu for the new rev and Walter Su for the review. I'm +1 on 007 pending jenkins.

        Show
        yzhangal Yongjun Zhang added a comment - Thanks Tony Wu for the new rev and Walter Su for the review. I'm +1 on 007 pending jenkins.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Seems jenkins was not triggered, I did one here
        https://builds.apache.org/job/PreCommit-HDFS-Build/13400/

        Show
        yzhangal Yongjun Zhang added a comment - Seems jenkins was not triggered, I did one here https://builds.apache.org/job/PreCommit-HDFS-Build/13400/
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 9s docker + precommit patch detected.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 3m 16s trunk passed
        +1 compile 0m 37s trunk passed with JDK v1.8.0_60
        +1 compile 0m 32s trunk passed with JDK v1.7.0_79
        +1 checkstyle 0m 17s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        -1 findbugs 2m 5s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs
        +1 javadoc 1m 9s trunk passed with JDK v1.8.0_60
        +1 javadoc 1m 57s trunk passed with JDK v1.7.0_79
        +1 mvninstall 0m 42s the patch passed
        +1 compile 0m 37s the patch passed with JDK v1.8.0_60
        +1 javac 0m 37s the patch passed
        +1 compile 0m 40s the patch passed with JDK v1.7.0_79
        +1 javac 0m 40s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 17s the patch passed
        +1 javadoc 1m 25s the patch passed with JDK v1.8.0_60
        +1 javadoc 1m 55s the patch passed with JDK v1.7.0_79
        -1 unit 57m 49s hadoop-hdfs in the patch failed with JDK v1.8.0_60.
        -1 unit 56m 30s hadoop-hdfs in the patch failed with JDK v1.7.0_79.
        -1 asflicense 0m 25s Patch generated 58 ASF License warnings.
        135m 50s



        Reason Tests
        JDK v1.8.0_60 Failed junit tests hadoop.hdfs.TestDecommission
          hadoop.hdfs.server.blockmanagement.TestNodeCount
          hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
        JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.blockmanagement.TestNodeCount
          hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130
          hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints



        Subsystem Report/Notes
        Docker Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-05
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12770663/HDFS-9236.007.patch
        JIRA Issue HDFS-9236
        Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile
        uname Linux 5aeed2b7f49c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-e8bd3ad/precommit/personality/hadoop.sh
        git revision trunk / ff47f35
        Default Java 1.7.0_79
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_60 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79
        findbugs v3.0.0
        findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt
        JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13400/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 226MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13400/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s docker + precommit patch detected. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 3m 16s trunk passed +1 compile 0m 37s trunk passed with JDK v1.8.0_60 +1 compile 0m 32s trunk passed with JDK v1.7.0_79 +1 checkstyle 0m 17s trunk passed +1 mvneclipse 0m 14s trunk passed -1 findbugs 2m 5s hadoop-hdfs-project/hadoop-hdfs in trunk cannot run convertXmlToText from findbugs +1 javadoc 1m 9s trunk passed with JDK v1.8.0_60 +1 javadoc 1m 57s trunk passed with JDK v1.7.0_79 +1 mvninstall 0m 42s the patch passed +1 compile 0m 37s the patch passed with JDK v1.8.0_60 +1 javac 0m 37s the patch passed +1 compile 0m 40s the patch passed with JDK v1.7.0_79 +1 javac 0m 40s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 17s the patch passed +1 javadoc 1m 25s the patch passed with JDK v1.8.0_60 +1 javadoc 1m 55s the patch passed with JDK v1.7.0_79 -1 unit 57m 49s hadoop-hdfs in the patch failed with JDK v1.8.0_60. -1 unit 56m 30s hadoop-hdfs in the patch failed with JDK v1.7.0_79. -1 asflicense 0m 25s Patch generated 58 ASF License warnings. 135m 50s Reason Tests JDK v1.8.0_60 Failed junit tests hadoop.hdfs.TestDecommission   hadoop.hdfs.server.blockmanagement.TestNodeCount   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes JDK v1.7.0_79 Failed junit tests hadoop.hdfs.server.blockmanagement.TestNodeCount   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130   hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints Subsystem Report/Notes Docker Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-11-05 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12770663/HDFS-9236.007.patch JIRA Issue HDFS-9236 Optional Tests asflicense javac javadoc mvninstall unit findbugs checkstyle compile uname Linux 5aeed2b7f49c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/patchprocess/apache-yetus-e8bd3ad/precommit/personality/hadoop.sh git revision trunk / ff47f35 Default Java 1.7.0_79 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_60 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_79 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_60.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_79.txt JDK v1.7.0_79 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13400/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13400/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 226MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13400/console This message was automatically generated.
        Hide
        twu Tony Wu added a comment -

        Looked at the failed tests and none are related to block recovery. Also manually ran the failed tests against latest code (on Linux, JDK1.7), all passes without error.

        Show
        twu Tony Wu added a comment - Looked at the failed tests and none are related to block recovery. Also manually ran the failed tests against latest code (on Linux, JDK1.7), all passes without error.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8769 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8769/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8769 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8769/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Hide
        yzhangal Yongjun Zhang added a comment -

        Committed to trunk and branch-2. Thanks Tony Wu for the contribution and Walter Su for the review.

        Show
        yzhangal Yongjun Zhang added a comment - Committed to trunk and branch-2. Thanks Tony Wu for the contribution and Walter Su for the review.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #648 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/648/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1371 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1371/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #638 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/638/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2578 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2578/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2518 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2518/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/)
        HDFS-9236. Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java
        Show
        hudson Hudson added a comment - ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #579 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/579/ ) HDFS-9236 . Missing sanity check for block size during block recovery. (yzhang: rev b64242c0d2cabd225a8fb7d25fed449d252e4fa1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ReplicaRecoveryInfo.java

          People

          • Assignee:
            twu Tony Wu
            Reporter:
            twu Tony Wu
          • Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development