Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5464

Simplify block report diff calculation

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:

      Description

      The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation.

      1. h5464_20131105.patch
        12 kB
        Tsz Wo Nicholas Sze
      2. h5464_20131105b.patch
        12 kB
        Tsz Wo Nicholas Sze
      3. h5464_20131105c.patch
        11 kB
        Tsz Wo Nicholas Sze
      4. h5464_20140715.patch
        16 kB
        Tsz Wo Nicholas Sze
      5. h5464_20140715b.patch
        14 kB
        Tsz Wo Nicholas Sze

        Activity

        Hide
        Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 0s The patch command could not apply the patch during dryrun.



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / f1a152c
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10653/console

        This message was automatically generated.

        Show
        Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10653/console This message was automatically generated.
        Hide
        Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 patch 0m 0s The patch command could not apply the patch during dryrun.



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / f1a152c
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10634/console

        This message was automatically generated.

        Show
        Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f1a152c Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10634/console This message was automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestDFSClientRetries
        org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
        org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN
        org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655886/h5464_20140715b.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7352//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12655865/h5464_20140715.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
        org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
        org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
        org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN
        org.apache.hadoop.hdfs.TestDFSClientRetries
        org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655865/h5464_20140715.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA org.apache.hadoop.hdfs.server.datanode.TestBlockHasMultipleReplicasOnSameDN org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7351//console This message is automatically generated.
        Hide
        Colin Patrick McCabe added a comment -

        To be honest, I don't think this change makes the code any clearer. Setting block lengths to negative, and then flipping them back later, is a hack that needs to be explained by comments. It's not obvious to me that this hack is any better than the hack of adding a sentinel element to the list. In fact, I would argue the sentinel element is somewhat clearer, since you're not overloading the meaning of what "block length" means to mean "block length and also whether I checked this recently."

        Show
        Colin Patrick McCabe added a comment - To be honest, I don't think this change makes the code any clearer. Setting block lengths to negative, and then flipping them back later, is a hack that needs to be explained by comments. It's not obvious to me that this hack is any better than the hack of adding a sentinel element to the list. In fact, I would argue the sentinel element is somewhat clearer, since you're not overloading the meaning of what "block length" means to mean "block length and also whether I checked this recently."
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Andrew Wang, could you help doing the microbenchmark you suggested?

        Show
        Tsz Wo Nicholas Sze added a comment - Andrew Wang , could you help doing the microbenchmark you suggested?
        Hide
        Andrew Wang added a comment -

        Considering we'll have 8 or 10TB disks in a few years, we could be seeing a lot more than just 500k blocks per DN. Storage dense nodes with 24+ disks are also out there. Memory accesses and conditional branches are also expensive. If we were just adding 500k integers together, it's not a big deal, but this loop is doing more than that.

        I'm not opposed in principle to this change since it is simpler and the same time complexity, but I'd like to see some microbenchmark results before committing it. Maybe rig something up with JMH?

        Show
        Andrew Wang added a comment - Considering we'll have 8 or 10TB disks in a few years, we could be seeing a lot more than just 500k blocks per DN. Storage dense nodes with 24+ disks are also out there. Memory accesses and conditional branches are also expensive. If we were just adding 500k integers together, it's not a big deal, but this loop is doing more than that. I'm not opposed in principle to this change since it is simpler and the same time complexity, but I'd like to see some microbenchmark results before committing it. Maybe rig something up with JMH?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h5464_20140715b.patch: reverted other tests in TestBlockInfo.

        Show
        Tsz Wo Nicholas Sze added a comment - h5464_20140715b.patch: reverted other tests in TestBlockInfo.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I think looping 500,000 is nothing in a modern computer. No?

        You are right that I should not remove other tests.

        Show
        Tsz Wo Nicholas Sze added a comment - I think looping 500,000 is nothing in a modern computer. No? You are right that I should not remove other tests.
        Hide
        Colin Patrick McCabe added a comment -
             // collect blocks that have not been reported
        -    // all of them are next to the delimiter
        -    Iterator<BlockInfo> it = storageInfo.new BlockIterator(delimiter.getNext(0));
        -    while(it.hasNext())
        -      toRemove.add(it.next());
        -    storageInfo.removeBlock(delimiter);
        +    for(BlockInfo b : storageInfo) {
        +      final long n = b.getNumBytes();
        +      if (n < 0) {
        +        // reset the length of visited block 
        +        b.setNumBytes(-n - 1);
        +      } else {
        +        toRemove.add(b);
        +      }
        +    }
        

        Previously, we only ended up looping over the blocks that were not reported. Now, with this change, we'll loop over all blocks in the DataNodeDescriptor. Do you agree?

        This seems like it will be much slower. Imagine a datanode with 500,000 blocks, none of which have been removed since the previous block report. Previously, this loop would do nothing. Now, with this change, we'll be looping over the full 500,000 blocks again.

        -  \@Test
        -  public void testAddStorage() throws Exception {
        ...
        -  \@Test
        -  public void testReplaceStorageIfDifferetnOneAlreadyExistedFromSameDataNode() throws Exception {
        

        I understand removing testBlockListMoveToHead, but why remove these other tests?

        Show
        Colin Patrick McCabe added a comment - // collect blocks that have not been reported - // all of them are next to the delimiter - Iterator<BlockInfo> it = storageInfo. new BlockIterator(delimiter.getNext(0)); - while (it.hasNext()) - toRemove.add(it.next()); - storageInfo.removeBlock(delimiter); + for (BlockInfo b : storageInfo) { + final long n = b.getNumBytes(); + if (n < 0) { + // reset the length of visited block + b.setNumBytes(-n - 1); + } else { + toRemove.add(b); + } + } Previously, we only ended up looping over the blocks that were not reported. Now, with this change, we'll loop over all blocks in the DataNodeDescriptor. Do you agree? This seems like it will be much slower. Imagine a datanode with 500,000 blocks, none of which have been removed since the previous block report. Previously, this loop would do nothing. Now, with this change, we'll be looping over the full 500,000 blocks again. - \@Test - public void testAddStorage() throws Exception { ... - \@Test - public void testReplaceStorageIfDifferetnOneAlreadyExistedFromSameDataNode() throws Exception { I understand removing testBlockListMoveToHead , but why remove these other tests?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Some new idea inspired by HDFS-6658.

        h5464_20140715.patch: marks visited blocks by setting length n to (-n-1).

        Show
        Tsz Wo Nicholas Sze added a comment - Some new idea inspired by HDFS-6658 . h5464_20140715.patch: marks visited blocks by setting length n to (-n-1).
        Hide
        Tsz Wo Nicholas Sze added a comment -

        The simplification does not work well – it uses more memory and longer running time. Resolving as Won't Fix.

        Show
        Tsz Wo Nicholas Sze added a comment - The simplification does not work well – it uses more memory and longer running time. Resolving as Won't Fix.
        Hide
        Konstantin Shvachko added a comment -

        > you won't argue that the new code is simpler than the existing

        Agreed on simpler.

        > see if I could come up a better solution

        Sure would be interesting to see. I doubt much can be done in this respect. We need to find blocks that did not appear in the report: in one pass and with constant memory overhead. May be an interview question.

        Show
        Konstantin Shvachko added a comment - > you won't argue that the new code is simpler than the existing Agreed on simpler. > see if I could come up a better solution Sure would be interesting to see. I doubt much can be done in this respect. We need to find blocks that did not appear in the report: in one pass and with constant memory overhead. May be an interview question.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h5464_20131105[bc].patch won't work. h5464_20131105.patch works but it uses more memory and longer running time.

        I agree with Konstantin that the original code is better although it is complicated. I will leave this for awhile and see if I could come up a better solution.

        Show
        Tsz Wo Nicholas Sze added a comment - h5464_20131105 [bc] .patch won't work. h5464_20131105.patch works but it uses more memory and longer running time. I agree with Konstantin that the original code is better although it is complicated. I will leave this for awhile and see if I could come up a better solution.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12612309/h5464_20131105b.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
        org.apache.hadoop.hdfs.TestLeaseRecovery2

        The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
        org.apache.hadoop.hdfs.server.namenode.TestFsck

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612309/h5464_20131105b.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestLeaseRecovery2 The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.server.namenode.TestFsck +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5342//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h5464_20131105c.patch: reverts the changes causing the findbugs warning.

        Show
        Tsz Wo Nicholas Sze added a comment - h5464_20131105c.patch: reverts the changes causing the findbugs warning.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12612253/h5464_20131105.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12612253/h5464_20131105.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 1 release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5341//console This message is automatically generated.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Here is the correct file: h5464_20131105b.patch

        Show
        Tsz Wo Nicholas Sze added a comment - Here is the correct file: h5464_20131105b.patch
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Actually, it is unnecessarily to add all the blocks to the remove list. Here is a new patch.

        h5466_20131105b.patch

        Show
        Tsz Wo Nicholas Sze added a comment - Actually, it is unnecessarily to add all the blocks to the remove list. Here is a new patch. h5466_20131105b.patch
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hi Konstantin,

        You may be correct that the new code use more memory, however, I beg you won't argue that the new code is simpler than the existing code.

        I will think about how to reduce the memory usage. Thanks for the input.

        Show
        Tsz Wo Nicholas Sze added a comment - Hi Konstantin, You may be correct that the new code use more memory, however, I beg you won't argue that the new code is simpler than the existing code. I will think about how to reduce the memory usage. Thanks for the input.
        Hide
        Konstantin Shvachko added a comment -

        So in regular case you will be adding 100,000 replicas to toRemove list only in order to delete most of them later. How does it make things simpler?
        The delimiter lets you keep the calculated lists as small as possible, reducing memory consumption, avoiding frequent GCs.

        Show
        Konstantin Shvachko added a comment - So in regular case you will be adding 100,000 replicas to toRemove list only in order to delete most of them later. How does it make things simpler? The delimiter lets you keep the calculated lists as small as possible, reducing memory consumption, avoiding frequent GCs.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h5464_20131105.patch: remove the delimiter logic.

        Show
        Tsz Wo Nicholas Sze added a comment - h5464_20131105.patch: remove the delimiter logic.

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Tsz Wo Nicholas Sze
          • Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development