Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1050

Reduce the memory foot-print of HiveInputSplit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      HiveInputSplit now inherits from FileSplit just because we want MapTask to forward the file name of the mapper:
      This makes HiveInputSplit big. See MAPREDUCE-1374

        private void updateJobWithSplit(final JobConf job, InputSplit inputSplit) {
          if (inputSplit instanceof FileSplit) {
            FileSplit fileSplit = (FileSplit) inputSplit;
            job.set("map.input.file", fileSplit.getPath().toString());
            job.setLong("map.input.start", fileSplit.getStart());
            job.setLong("map.input.length", fileSplit.getLength());
            LOG.info("split: " + job.get("map.input.file")+", range: "
                     + job.getLong("map.input.start", 0) + "-"
                     + job.getLong("map.input.length", 0));
          }
        }
      
      

      Once we move to the new MapReduce framework, we should be able to make smaller HiveInputFormat which will reduce the amount of memory needed on JobClient.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zshao Zheng Shao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: