Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2829

JT should consider the disk each task is on before scheduling jobs...

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • None
    • None
    • None

    Description

      The DataNode can support a JBOD config, where blocks exist on explicit disks. But this information is not exported or considered by the JT when assigning tasks. This leads to non-optimal disk use. if 4 slots are used, 2 running tasks will likely be on the same disk and we observe them running more slowly then other tasks on the same machine.

      We could follow a number of strategies to address this.

      for example: The data nodes could support a what disk is this block on call. Then the JT could discover the info and assign jobs accordingly.

      Of course the TT itself uses disks for merge and temp space and the datanodes on the same machine can be used by off node sources, so it is not clear optimizing all of this is simple enough to be worth it.

      This issue deserves study.

      Attachments

        Activity

          People

            Unassigned Unassigned
            eric14 Eric Baldeschwieler
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: