Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6268

Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.5.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      In NetworkTopology#pseudoSortByDistance, if no local node is found, it will always place the first rack local node in the list in front.

      This became an issue when a dataset was loaded from a single datanode. This datanode ended up being the first replica for all the blocks in the dataset. When running an Impala query, the non-local reads when reading past a block boundary were all hitting this node, meaning massive load skew.

        Attachments

        1. hdfs-6268-branch-2.001.patch
          27 kB
          Andrew Wang
        2. hdfs-6268-5.patch
          24 kB
          Andrew Wang
        3. hdfs-6268-4.patch
          22 kB
          Andrew Wang
        4. hdfs-6268-3.patch
          20 kB
          Andrew Wang
        5. hdfs-6268-2.patch
          15 kB
          Andrew Wang
        6. hdfs-6268-1.patch
          14 kB
          Andrew Wang

          Issue Links

            Activity

              People

              • Assignee:
                andrew.wang Andrew Wang
                Reporter:
                andrew.wang Andrew Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: