Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11015

Enforce timeout in balancer

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      1) Hung node detection: HDFS-6247 has removed the socket read timeout while adding the periodic response for slow block moves. However, the removal of the long timeout wasn't necessary. The timeout is still useful for avoiding hung nodes and does not abort slow moves.

      2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to be enforced, but it is not. An iteration can easily stretch to 30 to 40 minutes with a long tail. Because of the long tails, the balancer throughput does not reach its full potential.

      3) Slow move detection: For improved throughput, imposing block move timeout is sometimes necessary. We have seen an iteration taking over 2 hours because of one slow block move. This is mainly for catching exceptionally slow moves. Even if the balancer stops waiting, the move will continue and finish.

      In order to not undo what HDFS-6247 tried to achieve, it should be possible to configure off 3).

      Attachments

        1. balancer.png
          94 kB
          Kihwal Lee
        2. HDFS-11015-1.patch
          7 kB
          Kihwal Lee
        3. HDFS-11015-2.patch
          9 kB
          Kihwal Lee
        4. HDFS-11015-3.patch
          9 kB
          Kihwal Lee

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kihwal Kihwal Lee
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment