Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-317

Submitting job information via DFS in Map/Reduce causing consistency and performance issues

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Job submission involves two steps: submitting jobs to the System directory on DFS (done by the client), then submit the job via the JobSubmissionProtocol to JobTracker. This two step process is seen to have some issues:

      • Since the files need to be read from DFS, slowness in the DFS can cause job initialization to become costly. We faced this as described in HADOOP-5286 and HADOOP-4664.
      • The two step process could lead to inconsistent information being left around - like in HADOOP-5327 and HADOOP-5335.

      This JIRA is to explore options to remove the two step process in submitting a job.

        Activity

        Hide
        Doug Cutting added a comment -

        If a job submission is to persist, then we must write its data to the system directory, no?

        We could perhaps streamline things somewhat by sending the job.xml and splits directly to the jobtracker via RPC, and having it persist these. They'd still need to be written before the job could be started, but they'd no longer need to also be read. The job's jar file should probably continue to be written by the client, since it is not needed by the jobtracker. I'm not sure this would really help things much, however.

        Show
        Doug Cutting added a comment - If a job submission is to persist, then we must write its data to the system directory, no? We could perhaps streamline things somewhat by sending the job.xml and splits directly to the jobtracker via RPC, and having it persist these. They'd still need to be written before the job could be started, but they'd no longer need to also be read. The job's jar file should probably continue to be written by the client, since it is not needed by the jobtracker. I'm not sure this would really help things much, however.
        Hide
        Allen Wittenauer added a comment -

        With YARN, this isn't too relevant anymore. Closing as stale.

        Show
        Allen Wittenauer added a comment - With YARN, this isn't too relevant anymore. Closing as stale.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development