Hive
  1. Hive
  2. HIVE-5834

Avoid reading ORC footers for files which will not be split in OrcInputFormat::getSplits()

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: Tez
    • Labels:
    • Release Note:
      Avoid reading ORC footers for files where data and footer are in the same HDFS block

      Description

      OrcInputFormat::getSplits() fires off a SplitGenerator task for every file in the task.

      The footer & data are on the same node for all files with only 1 hdfs block. On top of that, it will never need a further split as long as its total size is < context.maxSize.

      Reading that footer locally is faster than reading it in the split gen and sending it from the AM.

        Activity

        Gopal V created issue -
        Hide
        Gopal V added a comment -

        Tested with a count(1) with a filter

        For a table of 1500 x 70mb ORC files.

        Before = 26 seconds
        After = 18 seconds

        For a table of 23699 x ~2mb ORC files

        Before = 32.9 seconds
        After = 23.0 seconds

        Show
        Gopal V added a comment - Tested with a count(1) with a filter For a table of 1500 x 70mb ORC files. Before = 26 seconds After = 18 seconds For a table of 23699 x ~2mb ORC files Before = 32.9 seconds After = 23.0 seconds
        Gopal V made changes -
        Field Original Value New Value
        Attachment HIVE-5834.00-tez.patch [ 12614147 ]
        Gopal V made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Release Note Avoid reading ORC footers for files where data and footer are in the same HDFS block
        Affects Version/s tez-branch [ 12324744 ]
        Affects Version/s 0.13.0 [ 12324986 ]
        Labels perfomance split
        Fix Version/s tez-branch [ 12324744 ]
        Hide
        Gunther Hagleitner added a comment -

        Nice find. LGTM.

        Show
        Gunther Hagleitner added a comment - Nice find. LGTM.
        Hide
        Gunther Hagleitner added a comment -

        Committed to branch. Thanks Gopal!

        Show
        Gunther Hagleitner added a comment - Committed to branch. Thanks Gopal!
        Gunther Hagleitner made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Gopal V
            Reporter:
            Gopal V
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development