Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6568

Streaming Tasks dies when Environment Variable value longer than 100k

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • contrib/streaming
    • None

    Description

      For some jobs I use mapred.input.format.class=org.apache.hadoop.mapred.lib.DelegatingInputFormat which also requires mapred.input.dir.formats/mapreduce.input.multipleinputs.dir.formats to be defined w/ a list of files provided in mapred.input.dir/mapreduce.input.fileinputformat.inputdir extended w/ input reader class per each record, sometimes this list becomes very huge and job starts failing due to size of environment variable.

      I added 100k limitation to org.apache.hadoop.streaming.PipeMapRed to addJobConfToEnvironment, but it doesn't seem a good solution due to different limitation on different platforms (Windows, Linux, etc)

      I'm sure there should be better way to detect system limits and make this fix more flexible

      Attachments

        1. PipeMapRed.diff
          0.1 kB
          Eugene A Slusarev

        Activity

          People

            Unassigned Unassigned
            antbofh Eugene A Slusarev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: