Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6268

Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 2.5.0
    • None
    • None

    Description

      In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front.

      This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew.

      Attachments

        1. hdfs-6268-1.patch
          14 kB
          Andrew Wang
        2. hdfs-6268-2.patch
          15 kB
          Andrew Wang
        3. hdfs-6268-3.patch
          20 kB
          Andrew Wang
        4. hdfs-6268-4.patch
          22 kB
          Andrew Wang
        5. hdfs-6268-5.patch
          24 kB
          Andrew Wang
        6. hdfs-6268-branch-2.001.patch
          27 kB
          Andrew Wang

        Issue Links

          Activity

            People

              andrew.wang Andrew Wang
              andrew.wang Andrew Wang
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: