Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5834

Avoid reading ORC footers for files which will not be split in OrcInputFormat::getSplits()

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: Tez
    • Labels:
    • Release Note:
      Avoid reading ORC footers for files where data and footer are in the same HDFS block

      Description

      OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in the task.

      The footer & data are on the same node for all files with only 1 hdfs block. On top of that, it will never need a further split as long as its total size is < context.maxSize.

      Reading that footer locally is faster than reading it in the split gen and sending it from the AM.

        Attachments

          Activity

            People

            • Assignee:
              gopalv Gopal V
              Reporter:
              gopalv Gopal V
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: