[MAPREDUCE-6568] Streaming Tasks dies when Environment Variable value longer than 100k - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: contrib/streaming
Labels:
None

Description

For some jobs I use mapred.input.format.class=org.apache.hadoop.mapred.lib.DelegatingInputFormat which also requires mapred.input.dir.formats/mapreduce.input.multipleinputs.dir.formats to be defined w/ a list of files provided in mapred.input.dir/mapreduce.input.fileinputformat.inputdir extended w/ input reader class per each record, sometimes this list becomes very huge and job starts failing due to size of environment variable.

I added 100k limitation to org.apache.hadoop.streaming.PipeMapRed to addJobConfToEnvironment, but it doesn't seem a good solution due to different limitation on different platforms (Windows, Linux, etc)

I'm sure there should be better way to detect system limits and make this fix more flexible

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PipeMapRed.diff
08/Dec/15 10:11
0.1 kB
Eugene A Slusarev

Activity

People

Assignee:: Unassigned

Reporter:: Eugene A Slusarev

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Dec/15 10:10

Updated:: 08/Dec/15 10:11