Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23014

ORC reading performance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.3.6
    • None
    • ORC
    • None

    Description

      Spark 3 adds support for using Hive 2.3.6 besides the old Hive 1.2.1 version. Some of the ORC reading benchmark shows that there is a huge performance difference in ORC reading between the 2 versions. I measured that org.apache.hadoop.hive.ql.io.orc.ReaderImpl in hive-exec-2.3.6-core.jar is ~3-5 times slower than in hive-exec-1.2.1.spark2.jar.

      I'm not sure if more recent Hive versions still suffer from this performance regression.

      Please see some details here: SPARK-30565

      Attachments

        Activity

          People

            Unassigned Unassigned
            petertoth Peter Toth
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: