[HIVE-1050] Reduce the memory foot-print of HiveInputSplit - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

HiveInputSplit now inherits from FileSplit just because we want MapTask to forward the file name of the mapper:
This makes HiveInputSplit big. See ~~MAPREDUCE-1374~~

  private void updateJobWithSplit(final JobConf job, InputSplit inputSplit) {
    if (inputSplit instanceof FileSplit) {
      FileSplit fileSplit = (FileSplit) inputSplit;
      job.set("map.input.file", fileSplit.getPath().toString());
      job.setLong("map.input.start", fileSplit.getStart());
      job.setLong("map.input.length", fileSplit.getLength());
      LOG.info("split: " + job.get("map.input.file")+", range: "
               + job.getLong("map.input.start", 0) + "-"
               + job.getLong("map.input.length", 0));
    }
  }

Once we move to the new MapReduce framework, we should be able to make smaller HiveInputFormat which will reduce the amount of memory needed on JobClient.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Zheng Shao

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/Jan/10 08:37

Updated:: 13/Jan/10 08:37