Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
For some jobs I use mapred.input.format.class=org.apache.hadoop.mapred.lib.DelegatingInputFormat which also requires mapred.input.dir.formats/mapreduce.input.multipleinputs.dir.formats to be defined w/ a list of files provided in mapred.input.dir/mapreduce.input.fileinputformat.inputdir extended w/ input reader class per each record, sometimes this list becomes very huge and job starts failing due to size of environment variable.
I added 100k limitation to org.apache.hadoop.streaming.PipeMapRed to addJobConfToEnvironment, but it doesn't seem a good solution due to different limitation on different platforms (Windows, Linux, etc)
I'm sure there should be better way to detect system limits and make this fix more flexible