Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15390

Orc reader unnecessarily reading stripe footers with hive.optimize.index.filter set to true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.1
    • 2.3.0
    • ORC
    • None

    Description

      In a split given to a task, the task's orc reader is unnecessarily reading stripe footers for stripes that are not its responsibility to read. This is happening with hive.optimize.index.filter set to true.

      Assuming one split per task(no tez grouping considered), a task should not need to read beyond the split's end offset. Even in some split computation strategies where a split's end offset can be in the middle of a stripe, it should not need to read more than one stripe beyond the split's end offset(to fully read a stripe that started in it). However I see that some tasks make unnecessary filesystem calls to read all the stripe footers in a file from the split start offset till the end of the file.

      Attachments

        1. HIVE-15390.1.patch
          0.7 kB
          Abhishek Somani
        2. HIVE-15390.patch
          0.7 kB
          Abhishek Somani

        Activity

          People

            asomani Abhishek Somani
            asomani Abhishek Somani
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: