Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5834

Avoid reading ORC footers for files which will not be split in OrcInputFormat::getSplits()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • tez-branch
    • tez-branch
    • Tez
    • Avoid reading ORC footers for files where data and footer are in the same HDFS block

    Description

      OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in the task.

      The footer & data are on the same node for all files with only 1 hdfs block. On top of that, it will never need a further split as long as its total size is < context.maxSize.

      Reading that footer locally is faster than reading it in the split gen and sending it from the AM.

      Attachments

        1. HIVE-5834.00-tez.patch
          1 kB
          Gopal Vijayaraghavan

        Activity

          People

            gopalv Gopal Vijayaraghavan
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: