Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1864

Support for big jar file (>2G)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Won't Fix
    • 0.14.1
    • None
    • None

    Description

      We have huge size binary that need to be distributed onto tasktracker nodes in Hadoop streaming mode. We've tried both -file option and -cacheArchive option. It seems the tasktracker node cannot unjar jar files bigger than 2G. We are considering split our binaries into multiple jars, but with -file, it seems we cannot do it. Also, we would prefer -cacheArchive option for performance issue, but it seems -cacheArchive does not allow more than appearance in the streaming options. Even if -cacheArchive support multiple jars, we still need a way to put the jars into a single directory tree, instead of using multiple symbolic links.

      So, in general, we need a feasible and efficient way to update large size (>2G) binaries for Hadoop streaming. Don't know if there is an existing solution that we either didn't find or took it wrong. Or there should be some extra work to provide a solution?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yhan Yiping Han
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: