Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1492

truly shared cache for jars (jobjar/libjar)

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.4-alpha
    • 2.9.0, 3.0.0
    • None
    • None
    • Hide
      The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-­upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.
      Show
      The YARN Shared Cache provides the facility to upload and manage shared application resources to HDFS in a safe and scalable manner. YARN applications can leverage resources uploaded by other applications or previous runs of the same application without having to re-­upload and localize identical files multiple times. This will save network resources and reduce YARN application startup time.

    Description

      Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of "bringing compute to where data is". This is wasteful because in most cases code doesn't change much across many jobs.

      I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion.

      Attachments

        1. shared_cache_design_v2.pdf
          26 kB
          Sangjin Lee
        2. shared_cache_design_v3.pdf
          26 kB
          Sangjin Lee
        3. shared_cache_design_v4.pdf
          25 kB
          Sangjin Lee
        4. shared_cache_design_v5.pdf
          253 kB
          Chris Trezzo
        5. shared_cache_design_v6.pdf
          251 kB
          Chris Trezzo
        6. shared_cache_design.pdf
          24 kB
          Sangjin Lee
        7. YARN-1492-all-trunk-v1.patch
          435 kB
          Chris Trezzo
        8. YARN-1492-all-trunk-v2.patch
          455 kB
          Chris Trezzo
        9. YARN-1492-all-trunk-v3.patch
          455 kB
          Chris Trezzo
        10. YARN-1492-all-trunk-v4.patch
          455 kB
          Chris Trezzo
        11. YARN-1492-all-trunk-v5.patch
          456 kB
          Chris Trezzo

        Issue Links

        There are no Sub-Tasks for this issue.

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ctrezzo Chris Trezzo
            sjlee0 Sangjin Lee
            Votes:
            3 Vote for this issue
            Watchers:
            81 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment