Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The job init thread currently initializes one job at a time. However, this is a lengthy and partly IO-bound process because all of the job's block locations need to be resolved through the namenode and a map of them needs to be built. It can take tens of seconds. As a result, the cluster sometimes initializes jobs too slowly for full utilization to be achieved, if there are many small jobs queued up. It would be better to have a pool of threads that initialize multiple jobs in parallel. One thing to be careful of, however, is not causing deadlocks or holding locks for too long in these threads.

        Attachments

        1. parallel-job-init-v1.patch
          5 kB
          Matei Zaharia
        2. hadoop-4664-v4.patch
          14 kB
          Jothi Padmanabhan
        3. hadoop-4664-v3.patch
          14 kB
          Jothi Padmanabhan
        4. hadoop-4664-v2.patch
          14 kB
          Jothi Padmanabhan
        5. hadoop-4664-v1.patch
          13 kB
          Jothi Padmanabhan

          Issue Links

            Activity

              People

              • Assignee:
                jothipn Jothi Padmanabhan
                Reporter:
                matei Matei Zaharia
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: