Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6989

[Umbrella] Uploader tool for Distributed Cache Deploy of the mapreduce framework and dependencies

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      The proposal is to create a tool that collects all available jars in the Hadoop classpath and adds them to a single tarball file. It then uploads the resulting archive to an HDFS directory. This saves the cluster administrator from having to set this up manually for Distributed Cache Deploy.

      Attachments

        Activity

          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          Uploading design doc

          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - Uploading design doc
          ctrezzo Chris Trezzo added a comment -

          Hey miklos.szegedi@cloudera.com! Thanks for the work so far! I have a question around the high-level approach: Is there a reason why we can't leverage the shared cache for this? There is already an upload mechanism that has been built, along with a cleaning mechanism and a way to cache similar jars.

          ctrezzo Chris Trezzo added a comment - Hey miklos.szegedi@cloudera.com ! Thanks for the work so far! I have a question around the high-level approach: Is there a reason why we can't leverage the shared cache for this? There is already an upload mechanism that has been built, along with a cleaning mechanism and a way to cache similar jars.
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          ctrezzo, thanks for the comment. I believe they can extend each other but they have slightly distinct functionality. This tool is rather a collector for multiple jars in the class path to a single tarball and it also uploads but that is just a auxiliary task. It could leverage the shared cache though not to upload a duplicate instance of the same jar. Please let me know, if I am missing something. Now that you mentioned, there is a need to delete the jar once no applications are using it. It would be very useful, if we could solve that with the shared cache.

          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - ctrezzo , thanks for the comment. I believe they can extend each other but they have slightly distinct functionality. This tool is rather a collector for multiple jars in the class path to a single tarball and it also uploads but that is just a auxiliary task. It could leverage the shared cache though not to upload a duplicate instance of the same jar. Please let me know, if I am missing something. Now that you mentioned, there is a need to delete the jar once no applications are using it. It would be very useful, if we could solve that with the shared cache.
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          One more thing, this tool should run only once per upgrade to update the framework only. Individual tasks can still specify their own jars and use the shared cache for those.

          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - One more thing, this tool should run only once per upgrade to update the framework only. Individual tasks can still specify their own jars and use the shared cache for those.

          People

            miklos.szegedi@cloudera.com Miklos Szegedi
            miklos.szegedi@cloudera.com Miklos Szegedi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: