Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1901

Jobs should not submit the same jar files over and over again

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently each Hadoop job uploads the required resources (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in executing this job would then download these resources into local disk.

      In an environment where most of the users are using a standard set of jars and files (because they are using a framework like Hive/Pig) - the same jars keep getting uploaded and downloaded repeatedly. The overhead of this protocol (primarily in terms of end-user latency) is significant when:

      • the jobs are small (and conversantly - large in number)
      • Namenode is under load (meaning hdfs latencies are high and made worse, in part, by this protocol)

      Hadoop should provide a way for jobs in a cooperative environment to not submit the same files over and again. Identifying and caching execution resources by a content signature (md5/sha) would be a good alternative to have available.

      1. 1901.PATCH
        55 kB
        Junjie Liang
      2. 1901.PATCH
        65 kB
        Junjie Liang

        Issue Links

          Activity

          Junjie Liang made changes -
          Attachment 1901.PATCH [ 12458972 ]
          Junjie Liang made changes -
          Attachment 1901.PATCH [ 12450226 ]
          Jeff Hammerbacher made changes -
          Field Original Value New Value
          Link This issue relates to MAPREDUCE-1902 [ MAPREDUCE-1902 ]
          Joydeep Sen Sarma created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Joydeep Sen Sarma
            • Votes:
              0 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:

                Development