Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6268

Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found

VotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.5.0
    • None
    • None

    Description

      In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front.

      This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew.

      Attachments

        1. hdfs-6268-branch-2.001.patch
          27 kB
          Andrew Wang
        2. hdfs-6268-5.patch
          24 kB
          Andrew Wang
        3. hdfs-6268-4.patch
          22 kB
          Andrew Wang
        4. hdfs-6268-3.patch
          20 kB
          Andrew Wang
        5. hdfs-6268-2.patch
          15 kB
          Andrew Wang
        6. hdfs-6268-1.patch
          14 kB
          Andrew Wang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andrew.wang Andrew Wang
            andrew.wang Andrew Wang
            Votes:
            0 Vote for this issue
            Watchers:
            14 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment