Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6568

Streaming Tasks dies when Environment Variable value longer than 100k

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Labels:
      None

      Description

      For some jobs I use mapred.input.format.class=org.apache.hadoop.mapred.lib.DelegatingInputFormat which also requires mapred.input.dir.formats/mapreduce.input.multipleinputs.dir.formats to be defined w/ a list of files provided in mapred.input.dir/mapreduce.input.fileinputformat.inputdir extended w/ input reader class per each record, sometimes this list becomes very huge and job starts failing due to size of environment variable.

      I added 100k limitation to org.apache.hadoop.streaming.PipeMapRed to addJobConfToEnvironment, but it doesn't seem a good solution due to different limitation on different platforms (Windows, Linux, etc)

      I'm sure there should be better way to detect system limits and make this fix more flexible

        Attachments

        1. PipeMapRed.diff
          0.1 kB
          Eugene A Slusarev

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              antbofh Eugene A Slusarev
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: