Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13174

hdfs mover -p /path times out after 20 min

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Fix Version/s: 3.2.0, 3.1.1, 3.0.4
    • Component/s: balancer & mover
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      Mover could have fail after 20+ minutes if a block move was enqueued for this long, between two DataNodes due to an internal constant that was introduced for Balancer, but affected Mover as well.
      The internal constant can be configured with the dfs.balancer.max-iteration-time parameter after the patch, and affects only the Balancer. Default is 20 minutes.
      Show
      Mover could have fail after 20+ minutes if a block move was enqueued for this long, between two DataNodes due to an internal constant that was introduced for Balancer, but affected Mover as well. The internal constant can be configured with the dfs.balancer.max-iteration-time parameter after the patch, and affects only the Balancer. Default is 20 minutes.

      Description

      In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source class, that is checked during dispatching the moves that the Balancer and the Mover does. This timeout is hardwired to 20 minutes.

      In the Balancer we have iterations, and even if an iteration is timing out the Balancer runs further and does an other iteration before it fails if there were no moves happened in a few iterations.

      The Mover on the other hand does not have iterations, so if moving a path runs for more than 20 minutes, and there are moves decided and enqueued between two DataNode, after 20 minutes Mover will stop with the following exception reported to the console (lines might differ as this exception came from a CDH5.12.1 installation).
      java.io.IOException: Block move timed out
      at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
      at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
      at org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
      at org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)

       

      Note that this issue is not coming up if all blocks can be moved inside the DataNodes without having to move the block to an other DataNode.

        Attachments

        1. HDFS-13174.001.patch
          14 kB
          Istvan Fajth
        2. HDFS-13174.002.patch
          15 kB
          Istvan Fajth
        3. HDFS-13174.003.patch
          15 kB
          Istvan Fajth
        4. HDFS-13174.004.patch
          15 kB
          Istvan Fajth
        5. HDFS-13174.005.patch
          16 kB
          Istvan Fajth

          Issue Links

            Activity

              People

              • Assignee:
                pifta Istvan Fajth
                Reporter:
                pifta Istvan Fajth
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: