Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
HiveInputSplit now inherits from FileSplit just because we want MapTask to forward the file name of the mapper:
This makes HiveInputSplit big. See MAPREDUCE-1374
private void updateJobWithSplit(final JobConf job, InputSplit inputSplit) { if (inputSplit instanceof FileSplit) { FileSplit fileSplit = (FileSplit) inputSplit; job.set("map.input.file", fileSplit.getPath().toString()); job.setLong("map.input.start", fileSplit.getStart()); job.setLong("map.input.length", fileSplit.getLength()); LOG.info("split: " + job.get("map.input.file")+", range: " + job.getLong("map.input.start", 0) + "-" + job.getLong("map.input.length", 0)); } }
Once we move to the new MapReduce framework, we should be able to make smaller HiveInputFormat which will reduce the amount of memory needed on JobClient.