Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23014

ORC reading performance

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.6
    • Fix Version/s: None
    • Component/s: ORC
    • Labels:
      None

      Description

      Spark 3 adds support for using Hive 2.3.6 besides the old Hive 1.2.1 version. Some of the ORC reading benchmark shows that there is a huge performance difference in ORC reading between the 2 versions. I measured that org.apache.hadoop.hive.ql.io.orc.ReaderImpl in hive-exec-2.3.6-core.jar is ~3-5 times slower than in hive-exec-1.2.1.spark2.jar.

      I'm not sure if more recent Hive versions still suffer from this performance regression.

      Please see some details here: SPARK-30565

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:

                Issue deployment