Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11384

Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When running balancer on hadoop cluster which have more than 3000 Datanodes will cause NameNode's rpc.CallQueueLength spike. We observed this situation could cause Hbase cluster failure due to RegionServer's WAL timeout.

      1. HDFS-11384-branch-2.8.011.patch
        13 kB
        Konstantin Shvachko
      2. HDFS-11384-branch-2.7.011.patch
        13 kB
        Konstantin Shvachko
      3. HDFS-11384-007.patch
        13 kB
        Konstantin Shvachko
      4. HDFS-11384.011.patch
        16 kB
        Konstantin Shvachko
      5. HDFS-11384.010.patch
        16 kB
        Konstantin Shvachko
      6. HDFS-11384.009.patch
        14 kB
        Konstantin Shvachko
      7. HDFS-11384.008.patch
        15 kB
        Konstantin Shvachko
      8. HDFS-11384.006.patch
        13 kB
        Konstantin Shvachko
      9. HDFS-11384.005.patch
        5 kB
        Konstantin Shvachko
      10. HDFS-11384.004.patch
        5 kB
        Konstantin Shvachko
      11. HDFS-11384.003.patch
        5 kB
        Konstantin Shvachko
      12. HDFS-11384.002.patch
        5 kB
        yunjiong zhao
      13. HDFS-11384.001.patch
        4 kB
        yunjiong zhao
      14. balancer.week.png
        50 kB
        yunjiong zhao
      15. balancer.day.png
        58 kB
        yunjiong zhao

        Issue Links

          Activity

          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          This patch provide a option to let balancer blocked for $dfs.balancer.getBlocks.interval.millis milliseconds after every getBlocks RPC call.
          The attached pictures shows the improvements after I apply this patch to our production cluster around Thursday 15:00.

          Show
          zhaoyunjiong yunjiong zhao added a comment - This patch provide a option to let balancer blocked for $dfs.balancer.getBlocks.interval.millis milliseconds after every getBlocks RPC call. The attached pictures shows the improvements after I apply this patch to our production cluster around Thursday 15:00.
          Hide
          benoyantony Benoy Antony added a comment -

          yunjiong zhao,
          If there are blocks to balance, then there will be sufficient delays between successive getBlocks.
          In such cases, we do not have to sleep.
          It will be better to keep track of the interval between successive getBlocks and sleep only for the required time.
          Can you also write a unit test to cover this change ?

          Show
          benoyantony Benoy Antony added a comment - yunjiong zhao , If there are blocks to balance, then there will be sufficient delays between successive getBlocks. In such cases, we do not have to sleep. It will be better to keep track of the interval between successive getBlocks and sleep only for the required time. Can you also write a unit test to cover this change ?
          Hide
          benoyantony Benoy Antony added a comment - - edited

          Sleeping inside the Synchronized block should be avoided as it will prevent other threads from obtaining the lock while the thread is sleeping.
          One tradeoff in sleeping fixed vs variable time is that code gets complicated. Since by default, the delay is not applied, it is okay to sleep for a fixed interval after getBlocks().

          Show
          benoyantony Benoy Antony added a comment - - edited Sleeping inside the Synchronized block should be avoided as it will prevent other threads from obtaining the lock while the thread is sleeping. One tradeoff in sleeping fixed vs variable time is that code gets complicated. Since by default, the delay is not applied, it is okay to sleep for a fixed interval after getBlocks().
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          Thank you Benoy Antony for your time to review this patch.

          Sleeping inside the Synchronized block should be avoided as it will prevent other threads from obtaining the lock while the thread is sleeping.

          I did it on purpose for sleeping inside the Synchronized block.
          In balancer there are multiple threads (by default 200) that may call getBlocks at same time, if user need to set dfs.balancer.getBlocks.interval.millis to slow down balancer, without a lock it won't work well due to at worst case there are still 200 getBlocks send to NameNode at same time.

          It will be better to keep track of the interval between successive getBlocks and sleep only for the required time.

          Since by default, this patch doesn't change anything, only add a option let user slow down balancer send getBlocks to NameNode, so I'd like to keep it as simple as possible.

          Show
          zhaoyunjiong yunjiong zhao added a comment - Thank you Benoy Antony for your time to review this patch. Sleeping inside the Synchronized block should be avoided as it will prevent other threads from obtaining the lock while the thread is sleeping. I did it on purpose for sleeping inside the Synchronized block. In balancer there are multiple threads (by default 200) that may call getBlocks at same time, if user need to set dfs.balancer.getBlocks.interval.millis to slow down balancer, without a lock it won't work well due to at worst case there are still 200 getBlocks send to NameNode at same time. It will be better to keep track of the interval between successive getBlocks and sleep only for the required time. Since by default, this patch doesn't change anything, only add a option let user slow down balancer send getBlocks to NameNode, so I'd like to keep it as simple as possible.
          Hide
          benoyantony Benoy Antony added a comment -

          Thanks for the explanation, yunjiong zhao.
          The patch looks good. I will commit this tomorrow if there are no other comments.

          Show
          benoyantony Benoy Antony added a comment - Thanks for the explanation, yunjiong zhao . The patch looks good. I will commit this tomorrow if there are no other comments.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 9s trunk passed
          +1 compile 0m 52s trunk passed
          +1 checkstyle 0m 43s trunk passed
          +1 mvnsite 0m 59s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 2m 0s trunk passed
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 53s the patch passed
          +1 compile 0m 51s the patch passed
          +1 javac 0m 51s the patch passed
          -0 checkstyle 0m 41s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 449 unchanged - 0 fixed = 451 total (was 449)
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          -1 findbugs 2m 3s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 40s the patch passed
          +1 unit 69m 40s hadoop-hdfs in the patch passed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          97m 34s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.getBlockList() calls Thread.sleep() with a lock held At Dispatcher.java:lock held At Dispatcher.java:[line 778]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855719/HDFS-11384.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux c8f1eb7124c2 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a97833e
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18524/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18524/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18524/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18524/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 9s trunk passed +1 compile 0m 52s trunk passed +1 checkstyle 0m 43s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 2m 0s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 53s the patch passed +1 compile 0m 51s the patch passed +1 javac 0m 51s the patch passed -0 checkstyle 0m 41s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 449 unchanged - 0 fixed = 451 total (was 449) +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. -1 findbugs 2m 3s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 40s the patch passed +1 unit 69m 40s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 97m 34s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   org.apache.hadoop.hdfs.server.balancer.Dispatcher$Source.getBlockList() calls Thread.sleep() with a lock held At Dispatcher.java:lock held At Dispatcher.java: [line 778] Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12855719/HDFS-11384.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux c8f1eb7124c2 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a97833e Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18524/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/18524/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18524/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18524/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          benoyantony Benoy Antony added a comment -

          yunjiong zhao, Could you please check to see if sleeping outside the sync block with a potentially longer delay is sufficient to avoid spike ?

          Show
          benoyantony Benoy Antony added a comment - yunjiong zhao , Could you please check to see if sleeping outside the sync block with a potentially longer delay is sufficient to avoid spike ?
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          Use Semaphore instead lock to avoid findbug warning.

          Show
          zhaoyunjiong yunjiong zhao added a comment - Use Semaphore instead lock to avoid findbug warning.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 2s trunk passed
          +1 compile 0m 45s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 47s trunk passed
          +1 javadoc 0m 40s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 43s the patch passed
          +1 javac 0m 43s the patch passed
          -0 checkstyle 0m 38s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 451 unchanged - 0 fixed = 453 total (was 451)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          +1 findbugs 1m 50s the patch passed
          +1 javadoc 0m 37s the patch passed
          -1 unit 71m 19s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          97m 52s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache
            hadoop.hdfs.web.TestWebHdfsFileSystemContract
            hadoop.hdfs.TestHDFSFileSystemContract
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12858571/HDFS-11384.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 5ceb83dbf5c7 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 55796a0
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18700/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18700/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18700/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18700/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 2s trunk passed +1 compile 0m 45s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 47s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed -0 checkstyle 0m 38s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 451 unchanged - 0 fixed = 453 total (was 451) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. +1 findbugs 1m 50s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 71m 19s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 97m 52s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache   hadoop.hdfs.web.TestWebHdfsFileSystemContract   hadoop.hdfs.TestHDFSFileSystemContract   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12858571/HDFS-11384.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 5ceb83dbf5c7 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 55796a0 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18700/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18700/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18700/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18700/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Hey yunjiong zhao, Benoy Antony. This seems like an important issue to fix. Thanks for reporting and working on the fix. We face the same problem on our cluster.
          If I understand correctly there are two changes in the patch:

          1. Allow only one Balancer thread to issue getBlocks() at any given time.
          2. Add a possibility to sleep for a thread between getBlocks() calls.

          Some thoughts

          1. I agree with Benoy Antony that the sleep() solution is not optimal. Do you guys think we can replace it with a wait()?
          2. I also want to understand what is the effect of this change on the case when there is a lot unbalanced nodes and the Balancer should run aggressively. Can we add some heuristics so that the Balancer could adjust by itself instead of adding the configuration parameter, which would require Balancer restart if it needs to be changed.
          Show
          shv Konstantin Shvachko added a comment - Hey yunjiong zhao , Benoy Antony . This seems like an important issue to fix. Thanks for reporting and working on the fix. We face the same problem on our cluster. If I understand correctly there are two changes in the patch: Allow only one Balancer thread to issue getBlocks() at any given time. Add a possibility to sleep for a thread between getBlocks() calls. Some thoughts I agree with Benoy Antony that the sleep() solution is not optimal. Do you guys think we can replace it with a wait() ? I also want to understand what is the effect of this change on the case when there is a lot unbalanced nodes and the Balancer should run aggressively. Can we add some heuristics so that the Balancer could adjust by itself instead of adding the configuration parameter, which would require Balancer restart if it needs to be changed.
          Hide
          zhaoyunjiong yunjiong zhao added a comment - - edited

          Thanks Konstantin Shvachko for review.
          Only when you set dfs.balancer.getBlocks.interval.millis to non-zero, Balancer will only allow one thread to issue getBlocks()at any given time. Otherwise this patch doesn't change anything.
          So only one change actually.

          If use wait, it will release the lock, so can't make sure there are only one thread will call getBlocks().

          By default, this patch doesn't change anything. So if you need run Balancer aggressively, don't set dfs.balancer.getBlocks.interval.millis.

          Can we add some heuristics so that the Balancer could adjust by itself instead of adding the configuration parameter

          I though this before. The best way I can thought is add new function in IPC that let clients get the CallQueueLength, if CallQueueLength is too high, block getBlocks() until the CallQueueLength become normal again.

          Show
          zhaoyunjiong yunjiong zhao added a comment - - edited Thanks Konstantin Shvachko for review. Only when you set dfs.balancer.getBlocks.interval.millis to non-zero, Balancer will only allow one thread to issue getBlocks()at any given time. Otherwise this patch doesn't change anything. So only one change actually. If use wait, it will release the lock, so can't make sure there are only one thread will call getBlocks(). By default, this patch doesn't change anything. So if you need run Balancer aggressively, don't set dfs.balancer.getBlocks.interval.millis. Can we add some heuristics so that the Balancer could adjust by itself instead of adding the configuration parameter I though this before. The best way I can thought is add new function in IPC that let clients get the CallQueueLength, if CallQueueLength is too high, block getBlocks() until the CallQueueLength become normal again.
          Hide
          shv Konstantin Shvachko added a comment -

          Took me some time to refresh my memory of the details of balancing. So here is my understanding of what is happening:

          • dispatchBlockMoves() spawns a thread for each Source, which represents all storages of the same type of a DN. Each thread executes dispatchBlocks() then.
          • dispatchBlocks() first tries to schedule block transfers for already selected (source:target) DN pairs, and if there are no more pairs it calls getBlockList(), which contacts NN to obtain the next portion of blocks from the source DN to be moved out.
          • The problem of two many RPC calls happens in the beginning of the Balancer iteration, when there are no scheduled pairs yet, so all the threads call getBlockList() and go to the NameNode simultaneously. So we need to disperse only the initial burst of RPCs at the start of an iteration, as subsequent getBlocks() are already dispersed fine.

          I see two ways to fix this:

          1. Add a parameter to getBlockList(long delay), where delay is a random time within a reasonable interval, which Balancer should wait for before sending the getBlocks() RPC to NN. The delay is only applied once, and set to 0 once applied. This looks rather straightforward to me.
          2. Allocate a reasonable throughput of getBlocks() RPCs to NN, and delay calls if the quota is exceeded. This is similar to Benoy Antony's proposal, but allows to precisely specify how much of NN RPC bandwidth is allocated for the Balancer.

          yunjiong zhao I understand you wanted a simple fix without making too much changes, but this looks like a real problem to me, and we should fix it in a more generic manner. I am fine if you wish to implement option #1 here as an initial step. Ultimately we should target solution #2, which could be done in another jira.

          Show
          shv Konstantin Shvachko added a comment - Took me some time to refresh my memory of the details of balancing. So here is my understanding of what is happening: dispatchBlockMoves() spawns a thread for each Source , which represents all storages of the same type of a DN. Each thread executes dispatchBlocks() then. dispatchBlocks() first tries to schedule block transfers for already selected (source:target) DN pairs, and if there are no more pairs it calls getBlockList() , which contacts NN to obtain the next portion of blocks from the source DN to be moved out. The problem of two many RPC calls happens in the beginning of the Balancer iteration, when there are no scheduled pairs yet, so all the threads call getBlockList() and go to the NameNode simultaneously. So we need to disperse only the initial burst of RPCs at the start of an iteration, as subsequent getBlocks() are already dispersed fine. I see two ways to fix this: Add a parameter to getBlockList(long delay) , where delay is a random time within a reasonable interval, which Balancer should wait for before sending the getBlocks() RPC to NN. The delay is only applied once, and set to 0 once applied. This looks rather straightforward to me. Allocate a reasonable throughput of getBlocks() RPCs to NN, and delay calls if the quota is exceeded. This is similar to Benoy Antony 's proposal, but allows to precisely specify how much of NN RPC bandwidth is allocated for the Balancer. yunjiong zhao I understand you wanted a simple fix without making too much changes, but this looks like a real problem to me, and we should fix it in a more generic manner. I am fine if you wish to implement option #1 here as an initial step. Ultimately we should target solution #2, which could be done in another jira.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Two other approaches to fix this:

          1. In getBlockList() Dispatcher fetches the blocks belonging to a particular DN from the NN. And then it moves those blocks from the source DN to the target DN. Dispatcher can instead get the blocks directly from the particular DN. This makes getBlocksList() a distributed operation and doesn't impact any specific node.

          2. Dispatcher can fetch the blocks from the Standby NN instead of the active. Balancer should be able to tolerate reasonable degree of staleness.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Two other approaches to fix this: 1. In getBlockList() Dispatcher fetches the blocks belonging to a particular DN from the NN. And then it moves those blocks from the source DN to the target DN. Dispatcher can instead get the blocks directly from the particular DN. This makes getBlocksList() a distributed operation and doesn't impact any specific node. 2. Dispatcher can fetch the blocks from the Standby NN instead of the active. Balancer should be able to tolerate reasonable degree of staleness.
          Hide
          zhz Zhe Zhang added a comment -

          Great discussions Konstantin Shvachko and Vinitha Reddy Gankidi.

          I suggest we keep this JIRA for making the pattern of getBlocks calls less bursty, and also create a JIRA to offload getBlocks calls to DN or SbNN. Vinitha Reddy Gankidi Thoughts?

          Show
          zhz Zhe Zhang added a comment - Great discussions Konstantin Shvachko and Vinitha Reddy Gankidi . I suggest we keep this JIRA for making the pattern of getBlocks calls less bursty, and also create a JIRA to offload getBlocks calls to DN or SbNN. Vinitha Reddy Gankidi Thoughts?
          Hide
          redvine Vinitha Reddy Gankidi added a comment - - edited

          If we were to offload the calls to DN, dispersing calls wouldn't be a pressing issue. I would like to get some feedback on the various approaches discussed. Benoy Antony, Daryn Sharp, Mingliang Liu and yunjiong zhao I would love to hear your opinions.

          Show
          redvine Vinitha Reddy Gankidi added a comment - - edited If we were to offload the calls to DN, dispersing calls wouldn't be a pressing issue. I would like to get some feedback on the various approaches discussed. Benoy Antony , Daryn Sharp , Mingliang Liu and yunjiong zhao I would love to hear your opinions.
          Hide
          zhz Zhe Zhang added a comment -

          Also pinging Ming Ma and Andrew Wang since this is related to "reading from SbNN".

          Show
          zhz Zhe Zhang added a comment - Also pinging Ming Ma and Andrew Wang since this is related to "reading from SbNN".
          Hide
          liuml07 Mingliang Liu added a comment -
          Show
          liuml07 Mingliang Liu added a comment - Ping Tsz Wo Nicholas Sze .
          Hide
          shv Konstantin Shvachko added a comment -

          Combining my and Vinitha's proposals in a common list here, for further discussion.

          1. Add a parameter to getBlockList(long delay), where delay is a random time within a reasonable interval, which Balancer should wait for before sending the getBlocks() RPC to NN. The delay is only applied once, and set to 0 once applied. This looks rather straightforward to me.
          2. Allocate a reasonable throughput of getBlocks() RPCs to NN, and delay calls if the quota is exceeded. This is similar to Benoy Antony's proposal, but allows to precisely specify how much of NN RPC bandwidth is allocated for the Balancer.
          3. Dispatcher can get the blocks directly from the particular DN instead of NN. This makes getBlocksList() a distributed operation and doesn't impact any specific node.
          4. Dispatcher can fetch the blocks from the Standby NN instead of the active. Balancer should be able to tolerate reasonable degree of staleness.

          I think (1) is quite easy and can solve immediate problem with bursting RPCs.
          I really like Vinitha's (3) obtaining block info directly from DNs. I did not look how much harder it is than other approaches.
          (2) and (4) are complimentary, that is we can have a limit on Balancer RPC bandwidth to NN, but run it against SBN.

          Show
          shv Konstantin Shvachko added a comment - Combining my and Vinitha's proposals in a common list here, for further discussion. Add a parameter to getBlockList(long delay) , where delay is a random time within a reasonable interval, which Balancer should wait for before sending the getBlocks() RPC to NN. The delay is only applied once, and set to 0 once applied. This looks rather straightforward to me. Allocate a reasonable throughput of getBlocks() RPCs to NN, and delay calls if the quota is exceeded. This is similar to Benoy Antony 's proposal, but allows to precisely specify how much of NN RPC bandwidth is allocated for the Balancer. Dispatcher can get the blocks directly from the particular DN instead of NN. This makes getBlocksList() a distributed operation and doesn't impact any specific node. Dispatcher can fetch the blocks from the Standby NN instead of the active. Balancer should be able to tolerate reasonable degree of staleness. I think (1) is quite easy and can solve immediate problem with bursting RPCs. I really like Vinitha's (3) obtaining block info directly from DNs. I did not look how much harder it is than other approaches. (2) and (4) are complimentary, that is we can have a limit on Balancer RPC bandwidth to NN, but run it against SBN.
          Hide
          redvine Vinitha Reddy Gankidi added a comment - - edited

          Konstantin Shvachko I'm leaning towards (4) instead of (3).
          isGoodBlockCandidate needs a global view of the block replicas. Also there is some additional logic to deal with erasure coded(EC) blocks and this may be a blocker for reading from DNs. Zhe Zhang you probably have more context regarding the EC blocks.

           /**
             * Decide if the block/blockGroup is a good candidate to be moved from source
             * to target. A block is a good candidate if
             * 1. the block is not in the process of being moved/has not been moved;
             * 2. the block does not have a replica/internalBlock on the target;
             * 3. doing the move does not reduce the number of racks that the block has
             */
            private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target,
                StorageType targetStorageType, DBlock block) {
          

          I agree that (2) and (4) are complimentary.

          Show
          redvine Vinitha Reddy Gankidi added a comment - - edited Konstantin Shvachko I'm leaning towards (4) instead of (3). isGoodBlockCandidate needs a global view of the block replicas. Also there is some additional logic to deal with erasure coded(EC) blocks and this may be a blocker for reading from DNs. Zhe Zhang you probably have more context regarding the EC blocks. /** * Decide if the block/blockGroup is a good candidate to be moved from source * to target. A block is a good candidate if * 1. the block is not in the process of being moved/has not been moved; * 2. the block does not have a replica/internalBlock on the target; * 3. doing the move does not reduce the number of racks that the block has */ private boolean isGoodBlockCandidate(StorageGroup source, StorageGroup target, StorageType targetStorageType, DBlock block) { I agree that (2) and (4) are complimentary.
          Hide
          shv Konstantin Shvachko added a comment -

          Yup, I was thinking the same last night. You need all locations of the block to select the right target that satisfies replication policy, cannot get it from a DN. So unfortunately we should discard (3) as an option.

          Show
          shv Konstantin Shvachko added a comment - Yup, I was thinking the same last night. You need all locations of the block to select the right target that satisfies replication policy, cannot get it from a DN. So unfortunately we should discard (3) as an option.
          Hide
          shv Konstantin Shvachko added a comment -

          Here is a relatively simple patch, which restricts the number of RPC calls from Balancer to NN to 20 calls per second.
          20 calls per second is a constant for now. It is chosen so that Balancer calls could not saturate NN's RPC queue based on metrics from a large cluster I was observing. LMK if people prefer it to be configurable.
          On a large cluster with 200 (default) dispatcher threads, and e.g. 500 underutilized DNs (sources) the initial 200 RPCs will be dispersed over 200 / 20 = 10 seconds. The remaining 300 RPCs should disperse organically as they subsequently reuse the same 200 threads from the pool.
          The patch has a unit test, which triggers the dispersion logic.

          Show
          shv Konstantin Shvachko added a comment - Here is a relatively simple patch, which restricts the number of RPC calls from Balancer to NN to 20 calls per second. 20 calls per second is a constant for now. It is chosen so that Balancer calls could not saturate NN's RPC queue based on metrics from a large cluster I was observing. LMK if people prefer it to be configurable. On a large cluster with 200 (default) dispatcher threads, and e.g. 500 underutilized DNs (sources) the initial 200 RPCs will be dispersed over 200 / 20 = 10 seconds. The remaining 300 RPCs should disperse organically as they subsequently reuse the same 200 threads from the pool. The patch has a unit test, which triggers the dispersion logic.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 13s HDFS-11384 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862552/HDFS-11384.003.patch
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19019/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 13s HDFS-11384 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862552/HDFS-11384.003.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19019/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 12m 54s trunk passed
          +1 compile 0m 46s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 43s trunk passed
          +1 javadoc 0m 39s trunk passed
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 251 unchanged - 0 fixed = 252 total (was 251)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 49s the patch passed
          +1 javadoc 0m 37s the patch passed
          -1 unit 63m 4s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          88m 1s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862555/HDFS-11384.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 70b6452eb18d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2aa8967
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19022/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19022/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19022/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19022/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 12m 54s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 46s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 251 unchanged - 0 fixed = 252 total (was 251) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 63m 4s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 88m 1s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862555/HDFS-11384.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 70b6452eb18d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2aa8967 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19022/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19022/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19022/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19022/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          redvine Vinitha Reddy Gankidi added a comment -

          Konstantin Shvachko The delay logic looks good to me. It would be great if we can make BALANCER_NUM_RPC_PER_SEC configurable with a default value of 20.The test does not ensure that there are indeed 20 getBlocks calls per second and it probably is not straightforward to ensure that. So I would like to have the ability to configure BALANCER_NUM_RPC_PER_SEC.

          Show
          redvine Vinitha Reddy Gankidi added a comment - Konstantin Shvachko The delay logic looks good to me. It would be great if we can make BALANCER_NUM_RPC_PER_SEC configurable with a default value of 20.The test does not ensure that there are indeed 20 getBlocks calls per second and it probably is not straightforward to ensure that. So I would like to have the ability to configure BALANCER_NUM_RPC_PER_SEC.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Konstantin Shvachko for the patch. In addition to Vinitha Reddy Gankidi's suggestion, two minor comments:

          1. Typo "Inteval"
          2. I guess Preconditions is the preferred way for cases like below:
            assert concurrentThreads > 0 : "Number of concurrent threads is 0.";
            
          Show
          zhz Zhe Zhang added a comment - Thanks Konstantin Shvachko for the patch. In addition to Vinitha Reddy Gankidi 's suggestion, two minor comments: Typo "Inteval" I guess Preconditions is the preferred way for cases like below: assert concurrentThreads > 0 : " Number of concurrent threads is 0." ;
          Hide
          shv Konstantin Shvachko added a comment -
          • I am usually very conservative about introducing new configuration parameters. Parameters seem to give you flexibility to adjust them, but in many cases administrators don't know what to do with that flexibility, because there so many of them. I prefer to have a reasonable constant value initially, and add a config variable later if other value are needed in certain cases. In the end adding configs is easy, but you can never remove them.
            In this particular case the BALANCER_NUM_RPC_PER_SEC is chosen so that big clusters would distribute initial RPC requests over 10 secs, and it does not effect small clusters at all. I think we are good with the constant set to 20 for now, but let me know if you see use cases for different values.
          • Fixed the typo in 004 patch. Thanks Zhe Zhang.
          • This would be a typical misuse of Preconditions, as we do in many cases in the code, and as it was discussed previously on many occasions. It is an assert, because we assume the condition should never happen. If it does, it's a bug, which should be caught during testing, with -ea option. And in the runtime we want to avoid checking any extra condition for performance reasons.
          Show
          shv Konstantin Shvachko added a comment - I am usually very conservative about introducing new configuration parameters. Parameters seem to give you flexibility to adjust them, but in many cases administrators don't know what to do with that flexibility, because there so many of them. I prefer to have a reasonable constant value initially, and add a config variable later if other value are needed in certain cases. In the end adding configs is easy, but you can never remove them. In this particular case the BALANCER_NUM_RPC_PER_SEC is chosen so that big clusters would distribute initial RPC requests over 10 secs, and it does not effect small clusters at all. I think we are good with the constant set to 20 for now, but let me know if you see use cases for different values. Fixed the typo in 004 patch. Thanks Zhe Zhang . This would be a typical misuse of Preconditions, as we do in many cases in the code, and as it was discussed previously on many occasions. It is an assert, because we assume the condition should never happen. If it does, it's a bug, which should be caught during testing, with -ea option. And in the runtime we want to avoid checking any extra condition for performance reasons.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 13m 32s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 48s trunk passed
          +1 javadoc 0m 40s trunk passed
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 49s the patch passed
          +1 javac 0m 49s the patch passed
          -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 251 unchanged - 0 fixed = 252 total (was 251)
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 57s the patch passed
          +1 javadoc 0m 38s the patch passed
          -1 unit 70m 24s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          96m 52s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.balancer.TestBalancer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:612578f
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862956/HDFS-11384.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux f6f002ce984b 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3a91376
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19056/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19056/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19056/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 32s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 48s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 0m 49s the patch passed +1 javac 0m 49s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 251 unchanged - 0 fixed = 252 total (was 251) +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 57s the patch passed +1 javadoc 0m 38s the patch passed -1 unit 70m 24s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 96m 52s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancer Subsystem Report/Notes Docker Image:yetus/hadoop:612578f JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12862956/HDFS-11384.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f6f002ce984b 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3a91376 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19056/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19056/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19056/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19056/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Addressing the checkstyle warning.
          Not seeing failures of TestBalancer.testBalancerWithStripedFile locally. Don't think it is related.

          Show
          shv Konstantin Shvachko added a comment - Addressing the checkstyle warning. Not seeing failures of TestBalancer.testBalancerWithStripedFile locally. Don't think it is related.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 23m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 15m 38s trunk passed
          +1 compile 0m 55s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 1m 47s trunk passed
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 59s the patch passed
          +1 compile 0m 55s the patch passed
          +1 javac 0m 55s the patch passed
          +1 checkstyle 0m 41s the patch passed
          +1 mvnsite 1m 7s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 6s the patch passed
          +1 javadoc 0m 47s the patch passed
          -1 unit 87m 30s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          140m 29s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestReconstructStripedBlocks



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:612578f
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12863142/HDFS-11384.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 6338c0ca2e43 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / b053fdc
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19067/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19067/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19067/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 23m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 38s trunk passed +1 compile 0m 55s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 1m 47s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 59s the patch passed +1 compile 0m 55s the patch passed +1 javac 0m 55s the patch passed +1 checkstyle 0m 41s the patch passed +1 mvnsite 1m 7s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 6s the patch passed +1 javadoc 0m 47s the patch passed -1 unit 87m 30s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 140m 29s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestReconstructStripedBlocks Subsystem Report/Notes Docker Image:yetus/hadoop:612578f JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12863142/HDFS-11384.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6338c0ca2e43 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b053fdc Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/19067/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19067/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19067/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Konstantin Shvachko, the main logic LGTM. I could not reproduce reported unit test failures either.

          +1 pending a few final comments:

          1. IIUC, BALANCER_NUM_RPC_PER_SEC is a best-effort throttling target, instead of a guaranteed threshold. E.g. it looks possible for Thread.sleep(delay) to be interrupted and getBlockList to be retried in the while loop. Or the entire dispatchBlocks call in a thread could die before delay seconds, then another future[j] will be issued without the delay. (Assuming this understanding is correct), I think this is the right way to handle this logic – it is a good idea not to optimize for these rare cases. But can we update the documentation for BALANCER_NUM_RPC_PER_SEC to reflect it?
          2. private void dispatchBlocks(long delay) { doesn't explain delay in its Javadoc.
          3. What does testBalancerRPCDelay verify? It is not checking the number of RPC calls.
          Show
          zhz Zhe Zhang added a comment - Thanks Konstantin Shvachko , the main logic LGTM. I could not reproduce reported unit test failures either. +1 pending a few final comments: IIUC, BALANCER_NUM_RPC_PER_SEC is a best-effort throttling target, instead of a guaranteed threshold. E.g. it looks possible for Thread.sleep(delay) to be interrupted and getBlockList to be retried in the while loop. Or the entire dispatchBlocks call in a thread could die before delay seconds, then another future[j] will be issued without the delay. (Assuming this understanding is correct), I think this is the right way to handle this logic – it is a good idea not to optimize for these rare cases. But can we update the documentation for BALANCER_NUM_RPC_PER_SEC to reflect it? private void dispatchBlocks(long delay) { doesn't explain delay in its Javadoc. What does testBalancerRPCDelay verify? It is not checking the number of RPC calls.
          Hide
          shv Konstantin Shvachko added a comment -
          • You are right, the rate of getBlocks RPCs is not guaranteed. Balancer can only do its best. The actual rate can be only guaranteed on the NameNode, but we don't want to go there.
            I made it clear in the comment for BALANCER_NUM_RPC_PER_SEC.
          • Added a decryption for delay.
          • It is pretty hard to measure the rate of operations on NN. Here is what I did.
            Created a spy FSNamesystem. The spy would call a modified getBlocks() when the corresponding RPC is called.
            The modified getBlocks() first calls the original method, then counts the number of calls and the time of the first and the last call to getBlocks(). Given the number of calls and the interval we can estimate the rate later on.
          Show
          shv Konstantin Shvachko added a comment - You are right, the rate of getBlocks RPCs is not guaranteed. Balancer can only do its best. The actual rate can be only guaranteed on the NameNode, but we don't want to go there. I made it clear in the comment for BALANCER_NUM_RPC_PER_SEC . Added a decryption for delay. It is pretty hard to measure the rate of operations on NN. Here is what I did. Created a spy FSNamesystem. The spy would call a modified getBlocks() when the corresponding RPC is called. The modified getBlocks() first calls the original method, then counts the number of calls and the time of the first and the last call to getBlocks() . Given the number of calls and the interval we can estimate the rate later on.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 15m 16s trunk passed
          +1 compile 0m 57s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 1m 6s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          -1 findbugs 1m 50s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 43s trunk passed
          +1 mvninstall 0m 58s the patch passed
          +1 compile 0m 53s the patch passed
          +1 javac 0m 53s the patch passed
          -0 checkstyle 0m 37s hadoop-hdfs-project/hadoop-hdfs: The patch generated 8 new + 281 unchanged - 1 fixed = 289 total (was 282)
          +1 mvnsite 1m 1s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 2m 2s the patch passed
          +1 javadoc 0m 42s the patch passed
          -1 unit 113m 21s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          142m 53s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
          Timed out junit tests org.apache.hadoop.hdfs.server.balancer.TestBalancer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12864621/HDFS-11384.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5bf1fdc0f19e 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fda86ef
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19180/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19180/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 15m 16s trunk passed +1 compile 0m 57s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 1m 6s trunk passed +1 mvneclipse 0m 15s trunk passed -1 findbugs 1m 50s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 43s trunk passed +1 mvninstall 0m 58s the patch passed +1 compile 0m 53s the patch passed +1 javac 0m 53s the patch passed -0 checkstyle 0m 37s hadoop-hdfs-project/hadoop-hdfs: The patch generated 8 new + 281 unchanged - 1 fixed = 289 total (was 282) +1 mvnsite 1m 1s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 2m 2s the patch passed +1 javadoc 0m 42s the patch passed -1 unit 113m 21s hadoop-hdfs in the patch failed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 142m 53s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency Timed out junit tests org.apache.hadoop.hdfs.server.balancer.TestBalancer Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12864621/HDFS-11384.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5bf1fdc0f19e 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fda86ef Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19180/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19180/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19180/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Addressed warnings

          Show
          shv Konstantin Shvachko added a comment - Addressed warnings
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 13m 25s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          -1 findbugs 1m 37s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 39s trunk passed
          +1 mvninstall 0m 47s the patch passed
          +1 compile 0m 44s the patch passed
          +1 javac 0m 44s the patch passed
          -0 checkstyle 0m 36s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 281 unchanged - 1 fixed = 284 total (was 282)
          +1 mvnsite 0m 50s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 44s the patch passed
          +1 javadoc 0m 38s the patch passed
          -1 unit 85m 51s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          111m 33s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.balancer.TestBalancer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12864665/HDFS-11384-007.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 305c4e63baa7 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fda86ef
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19185/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19185/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 13m 25s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 14s trunk passed -1 findbugs 1m 37s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 44s the patch passed +1 javac 0m 44s the patch passed -0 checkstyle 0m 36s hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 281 unchanged - 1 fixed = 284 total (was 282) +1 mvnsite 0m 50s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 44s the patch passed +1 javadoc 0m 38s the patch passed -1 unit 85m 51s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 111m 33s Reason Tests Failed junit tests hadoop.hdfs.server.balancer.TestBalancer Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12864665/HDFS-11384-007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 305c4e63baa7 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fda86ef Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19185/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19185/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19185/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Konstantin Shvachko for the update. The added Javadoc looks good.

          The unit test testBalancerRPCDelay fails for me locally with a NullPointerException.

          Show
          zhz Zhe Zhang added a comment - Thanks Konstantin Shvachko for the update. The added Javadoc looks good. The unit test testBalancerRPCDelay fails for me locally with a NullPointerException.
          Hide
          shv Konstantin Shvachko added a comment -
          1. Took some time to reproduce failures. I did not have any on my local box.
            Looks like the solution is to mock FSNamesystem before starting DataNodes. Otherwise the behavior is non-deterministic. I changed it and now it runs consistently on my local box. Let's try Jenkins.
          2. findbugs warnings are not related to the patch.
          3. There are 2 checkstyle warnings.
            • One complains that the number of parameters in doTest() is more than 7. Don't know why the magical number, but there was 8 parameters in doTest() already and I added one.
            • Second is about inner assignment, which is intentional in this case, because I want the two variables initially have the same value, and splitting the line into two statements would remove that meaning.
          Show
          shv Konstantin Shvachko added a comment - Took some time to reproduce failures. I did not have any on my local box. Looks like the solution is to mock FSNamesystem before starting DataNodes. Otherwise the behavior is non-deterministic. I changed it and now it runs consistently on my local box. Let's try Jenkins. findbugs warnings are not related to the patch. There are 2 checkstyle warnings. One complains that the number of parameters in doTest() is more than 7. Don't know why the magical number, but there was 8 parameters in doTest() already and I added one. Second is about inner assignment, which is intentional in this case, because I want the two variables initially have the same value, and splitting the line into two statements would remove that meaning.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks for the update Konstantin Shvachko. Now all other tests in TestBalancer pass except for testBalancerRPCDelay:

          java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to reach 40 replicas
          
          	at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:764)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:306)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:847)
          	at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2071)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:497)
          	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
          	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
          	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
          	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
          	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          
          Show
          zhz Zhe Zhang added a comment - Thanks for the update Konstantin Shvachko . Now all other tests in TestBalancer pass except for testBalancerRPCDelay : java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to reach 40 replicas at org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:764) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:306) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:847) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2071) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
          Hide
          shv Konstantin Shvachko added a comment -

          Added waitActive() after starting DataNodes.

          Show
          shv Konstantin Shvachko added a comment - Added waitActive() after starting DataNodes.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 14m 3s trunk passed
          +1 compile 0m 58s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 1m 4s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          -1 findbugs 1m 53s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 43s trunk passed
          +1 mvninstall 0m 54s the patch passed
          +1 compile 0m 52s the patch passed
          +1 javac 0m 52s the patch passed
          -0 checkstyle 0m 36s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 280 unchanged - 2 fixed = 284 total (was 282)
          +1 mvnsite 0m 57s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 56s the patch passed
          +1 javadoc 0m 42s the patch passed
          -1 unit 70m 15s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          98m 5s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865051/HDFS-11384.009.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux beb7a0d3cf1c 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 4ea2778
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19201/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19201/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 14m 3s trunk passed +1 compile 0m 58s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 15s trunk passed -1 findbugs 1m 53s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 43s trunk passed +1 mvninstall 0m 54s the patch passed +1 compile 0m 52s the patch passed +1 javac 0m 52s the patch passed -0 checkstyle 0m 36s hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 280 unchanged - 2 fixed = 284 total (was 282) +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 56s the patch passed +1 javadoc 0m 42s the patch passed -1 unit 70m 15s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 98m 5s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865051/HDFS-11384.009.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux beb7a0d3cf1c 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4ea2778 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19201/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19201/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19201/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          zhz Zhe Zhang added a comment -

          TestBalancerWithSaslDataTransfer fails on my local box too (it passes without the patch).

          Show
          zhz Zhe Zhang added a comment - TestBalancerWithSaslDataTransfer fails on my local box too (it passes without the patch).
          Hide
          shv Konstantin Shvachko added a comment -
          • New version. It fixes TestBalancerWithSaslDataTransfer.
          • Also separated TestBalancerRPCDelay from TestBalancer, since the latter is running more than 5 minutes already.
          • The TestDataNodeVolumeFailure series seems to have a long history of failures, looking at HDFS-11398. It fails intermittently with and without my patch.
          Show
          shv Konstantin Shvachko added a comment - New version. It fixes TestBalancerWithSaslDataTransfer . Also separated TestBalancerRPCDelay from TestBalancer, since the latter is running more than 5 minutes already. The TestDataNodeVolumeFailure series seems to have a long history of failures, looking at HDFS-11398 . It fails intermittently with and without my patch.
          Hide
          zhz Zhe Zhang added a comment -

          Thanks Konstantin Shvachko. Latest patch LGTM. +1 pending Jenkins.

          Show
          zhz Zhe Zhang added a comment - Thanks Konstantin Shvachko . Latest patch LGTM. +1 pending Jenkins.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 13m 24s trunk passed
          +1 compile 0m 47s trunk passed
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 0m 53s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          -1 findbugs 1m 38s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 39s trunk passed
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 46s the patch passed
          +1 javac 0m 46s the patch passed
          -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 280 unchanged - 2 fixed = 285 total (was 282)
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 44s the patch passed
          +1 javadoc 0m 37s the patch passed
          -1 unit 89m 0s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          114m 36s



          Reason Tests
          Failed junit tests hadoop.hdfs.web.TestWebHDFS
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865201/HDFS-11384.010.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 52bc523accab 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8b5f2c3
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19209/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19209/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 13m 24s trunk passed +1 compile 0m 47s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 0m 53s trunk passed +1 mvneclipse 0m 14s trunk passed -1 findbugs 1m 38s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 48s the patch passed +1 compile 0m 46s the patch passed +1 javac 0m 46s the patch passed -0 checkstyle 0m 35s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 280 unchanged - 2 fixed = 285 total (was 282) +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 44s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 89m 0s hadoop-hdfs in the patch failed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 114m 36s Reason Tests Failed junit tests hadoop.hdfs.web.TestWebHDFS Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865201/HDFS-11384.010.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 52bc523accab 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8b5f2c3 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19209/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19209/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19209/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Latest patch added Javadoc to TestBalancerRPCDelay, and corrected one long line.
          Merge to branch-2 is pretty straightforward.
          Both branch-2.8 and branch-2.7 don't have method getBlocks() in FSNamesystem, so spying on it won't help. I removed mocking for these two branches. Attaching patches for both.

          Show
          shv Konstantin Shvachko added a comment - Latest patch added Javadoc to TestBalancerRPCDelay, and corrected one long line. Merge to branch-2 is pretty straightforward. Both branch-2.8 and branch-2.7 don't have method getBlocks() in FSNamesystem, so spying on it won't help. I removed mocking for these two branches. Attaching patches for both.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 14m 38s trunk passed
          +1 compile 1m 16s trunk passed
          +1 checkstyle 0m 58s trunk passed
          +1 mvnsite 1m 29s trunk passed
          +1 mvneclipse 0m 22s trunk passed
          -1 findbugs 2m 26s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 0m 59s trunk passed
          +1 mvninstall 1m 15s the patch passed
          +1 compile 1m 11s the patch passed
          +1 javac 1m 11s the patch passed
          -0 checkstyle 0m 54s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 280 unchanged - 2 fixed = 285 total (was 282)
          +1 mvnsite 1m 16s the patch passed
          +1 mvneclipse 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 32s the patch passed
          +1 javadoc 0m 50s the patch passed
          -1 unit 73m 38s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          106m 50s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865201/HDFS-11384.010.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 331b4cd05e03 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8b5f2c3
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19210/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19210/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 14m 38s trunk passed +1 compile 1m 16s trunk passed +1 checkstyle 0m 58s trunk passed +1 mvnsite 1m 29s trunk passed +1 mvneclipse 0m 22s trunk passed -1 findbugs 2m 26s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 0m 59s trunk passed +1 mvninstall 1m 15s the patch passed +1 compile 1m 11s the patch passed +1 javac 1m 11s the patch passed -0 checkstyle 0m 54s hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 280 unchanged - 2 fixed = 285 total (was 282) +1 mvnsite 1m 16s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 32s the patch passed +1 javadoc 0m 50s the patch passed -1 unit 73m 38s hadoop-hdfs in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 106m 50s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865201/HDFS-11384.010.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 331b4cd05e03 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8b5f2c3 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19210/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19210/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19210/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11637 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11637/)
          HDFS-11384. Balancer disperses getBlocks calls to avoid NameNode's rpc (shv: rev 28eb2aabebd15c15a357d86e23ca407d3c85211c)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerRPCDelay.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11637 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11637/ ) HDFS-11384 . Balancer disperses getBlocks calls to avoid NameNode's rpc (shv: rev 28eb2aabebd15c15a357d86e23ca407d3c85211c) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerRPCDelay.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 12m 34s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 12s branch-2.7 passed
          +1 compile 1m 9s branch-2.7 passed with JDK v1.8.0_131
          +1 compile 1m 7s branch-2.7 passed with JDK v1.7.0_121
          +1 checkstyle 0m 27s branch-2.7 passed
          +1 mvnsite 1m 2s branch-2.7 passed
          +1 mvneclipse 0m 16s branch-2.7 passed
          -1 findbugs 3m 0s hadoop-hdfs-project/hadoop-hdfs in branch-2.7 has 1 extant Findbugs warnings.
          +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_131
          +1 javadoc 1m 44s branch-2.7 passed with JDK v1.7.0_121
          +1 mvninstall 0m 53s the patch passed
          +1 compile 0m 58s the patch passed with JDK v1.8.0_131
          +1 javac 0m 58s the patch passed
          +1 compile 1m 0s the patch passed with JDK v1.7.0_121
          +1 javac 1m 0s the patch passed
          -0 checkstyle 0m 22s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 58 unchanged - 0 fixed = 60 total (was 58)
          +1 mvnsite 0m 56s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 4347 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 whitespace 1m 34s The patch 176 line(s) with tabs.
          +1 findbugs 3m 7s the patch passed
          +1 javadoc 0m 57s the patch passed with JDK v1.8.0_131
          +1 javadoc 1m 42s the patch passed with JDK v1.7.0_121
          -1 unit 43m 48s hadoop-hdfs in the patch failed with JDK v1.7.0_121.
          -1 asflicense 0m 18s The patch generated 3 ASF License warnings.
          139m 17s



          Reason Tests
          JDK v1.8.0_131 Failed junit tests hadoop.hdfs.web.TestWebHdfsTokens
            hadoop.hdfs.TestBlockStoragePolicy
            hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          JDK v1.7.0_121 Failed junit tests hadoop.hdfs.server.mover.TestStorageMover
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Issue HDFS-11384
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865254/HDFS-11384-branch-2.7.011.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ffd6b154cc78 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / ac12063
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19212/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19212/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 12m 34s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 12s branch-2.7 passed +1 compile 1m 9s branch-2.7 passed with JDK v1.8.0_131 +1 compile 1m 7s branch-2.7 passed with JDK v1.7.0_121 +1 checkstyle 0m 27s branch-2.7 passed +1 mvnsite 1m 2s branch-2.7 passed +1 mvneclipse 0m 16s branch-2.7 passed -1 findbugs 3m 0s hadoop-hdfs-project/hadoop-hdfs in branch-2.7 has 1 extant Findbugs warnings. +1 javadoc 0m 59s branch-2.7 passed with JDK v1.8.0_131 +1 javadoc 1m 44s branch-2.7 passed with JDK v1.7.0_121 +1 mvninstall 0m 53s the patch passed +1 compile 0m 58s the patch passed with JDK v1.8.0_131 +1 javac 0m 58s the patch passed +1 compile 1m 0s the patch passed with JDK v1.7.0_121 +1 javac 1m 0s the patch passed -0 checkstyle 0m 22s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 58 unchanged - 0 fixed = 60 total (was 58) +1 mvnsite 0m 56s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 4347 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 1m 34s The patch 176 line(s) with tabs. +1 findbugs 3m 7s the patch passed +1 javadoc 0m 57s the patch passed with JDK v1.8.0_131 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_121 -1 unit 43m 48s hadoop-hdfs in the patch failed with JDK v1.7.0_121. -1 asflicense 0m 18s The patch generated 3 ASF License warnings. 139m 17s Reason Tests JDK v1.8.0_131 Failed junit tests hadoop.hdfs.web.TestWebHdfsTokens   hadoop.hdfs.TestBlockStoragePolicy   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots JDK v1.7.0_121 Failed junit tests hadoop.hdfs.server.mover.TestStorageMover   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Issue HDFS-11384 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865254/HDFS-11384-branch-2.7.011.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ffd6b154cc78 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / ac12063 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_131 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19212/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/19212/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19212/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          shv Konstantin Shvachko added a comment -

          Just committed this to 4 branches.
          Verified that failed tests for branch-2.7 are passing locally.

          Show
          shv Konstantin Shvachko added a comment - Just committed this to 4 branches. Verified that failed tests for branch-2.7 are passing locally.
          Hide
          zhaoyunjiong yunjiong zhao added a comment -

          Konstantin Shvachko Thanks for the fix.

          Show
          zhaoyunjiong yunjiong zhao added a comment - Konstantin Shvachko Thanks for the fix.
          Hide
          xiaoli xiaoli added a comment -

          The patch1 looks good!

          Show
          xiaoli xiaoli added a comment - The patch1 looks good!
          Hide
          ywskycn Wei Yan added a comment -

          For the following changes in the diff, why we set dSec to 0 when "j >= concurrentThreads", instead of "j <= concurrentThreads"? Did I miss anything there? Konstantin Shvachko, yunjiong zhao, Zhe Zhang

            // Calculate delay in seconds for the next iteration
            if(j >= concurrentThreads) {
                dSec = 0;
            } else if((j + 1) % BALANCER_NUM_RPC_PER_SEC == 0) {
                dSec++;
            }
          
          Show
          ywskycn Wei Yan added a comment - For the following changes in the diff, why we set dSec to 0 when "j >= concurrentThreads", instead of "j <= concurrentThreads"? Did I miss anything there? Konstantin Shvachko , yunjiong zhao , Zhe Zhang // Calculate delay in seconds for the next iteration if (j >= concurrentThreads) { dSec = 0; } else if ((j + 1) % BALANCER_NUM_RPC_PER_SEC == 0) { dSec++; }
          Hide
          shv Konstantin Shvachko added a comment -

          Hey Wei Yan we want to disperse the initial RPC. Once they are dispersed the rest of them should follow the pattern. Therefore we do not need to delay dispatchBlocks(0) when we start reusing the threads j >= concurrentThreads.
          As explained in this comment. Hope this makes sense.

          Show
          shv Konstantin Shvachko added a comment - Hey Wei Yan we want to disperse the initial RPC. Once they are dispersed the rest of them should follow the pattern. Therefore we do not need to delay dispatchBlocks(0) when we start reusing the threads j >= concurrentThreads . As explained in this comment . Hope this makes sense.
          Hide
          ywskycn Wei Yan added a comment -

          Gotcha, thanks for the explaination, Konstantin Shvachko

          Show
          ywskycn Wei Yan added a comment - Gotcha, thanks for the explaination, Konstantin Shvachko

            People

            • Assignee:
              shv Konstantin Shvachko
              Reporter:
              zhaoyunjiong yunjiong zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development