Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23014

ORC reading performance

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.6
    • None
    • ORC
    • None

    Description

      Spark 3 adds support for using Hive 2.3.6 besides the old Hive 1.2.1 version. Some of the ORC reading benchmark shows that there is a huge performance difference in ORC reading between the 2 versions. I measured that org.apache.hadoop.hive.ql.io.orc.ReaderImpl in hive-exec-2.3.6-core.jar is ~3-5 times slower than in hive-exec-1.2.1.spark2.jar.

      I'm not sure if more recent Hive versions still suffer from this performance regression.

      Please see some details here: SPARK-30565

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            petertoth Peter Toth

            Dates

              Created:
              Updated:

              Slack

                Issue deployment