Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1492

truly shared cache for jars (jobjar/libjar)

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: 2.9.0, 3.0.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Release Note:
      Hide
      The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-­upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.
      Show
      The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-­upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.

      Description

      Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of "bringing compute to where data is". This is wasteful because in most cases code doesn't change much across many jobs.

      I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion.

        Attachments

        1. shared_cache_design_v2.pdf
          26 kB
          Sangjin Lee
        2. shared_cache_design_v3.pdf
          26 kB
          Sangjin Lee
        3. shared_cache_design_v4.pdf
          25 kB
          Sangjin Lee
        4. shared_cache_design_v5.pdf
          253 kB
          Chris Trezzo
        5. shared_cache_design_v6.pdf
          251 kB
          Chris Trezzo
        6. shared_cache_design.pdf
          24 kB
          Sangjin Lee
        7. YARN-1492-all-trunk-v1.patch
          435 kB
          Chris Trezzo
        8. YARN-1492-all-trunk-v2.patch
          455 kB
          Chris Trezzo
        9. YARN-1492-all-trunk-v3.patch
          455 kB
          Chris Trezzo
        10. YARN-1492-all-trunk-v4.patch
          455 kB
          Chris Trezzo
        11. YARN-1492-all-trunk-v5.patch
          456 kB
          Chris Trezzo

          Issue Links

            Activity

              People

              • Assignee:
                ctrezzo Chris Trezzo
                Reporter:
                sjlee0 Sangjin Lee
              • Votes:
                3 Vote for this issue
                Watchers:
                81 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: