Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11087

spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.5.1
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      orc file version 0.12 with HIVE_8732
      hive version 1.2.1.2.3.0.0-2557

      Description

      I have an external hive table stored as partitioned orc file (see the table schema below). I tried to query from the table with where clause>

      hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
      hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 117")).

      But from the log file with debug logging level on, the ORC pushdown predicate was not generated.

      Unfortunately my table was not sorted when I inserted the data, but I expected the ORC pushdown predicate should be generated (because of the where clause) though

      Table schema
      ================================
      hive> describe formatted 4D;
      OK

      1. col_name data_type comment

      date int
      hh int
      x int
      y int
      height float
      u float
      v float
      w float
      ph float
      phb float
      t float
      p float
      pb float
      qvapor float
      qgraup float
      qnice float
      qnrain float
      tke_pbl float
      el_pbl float
      qcloud float

      1. Partition Information
      2. col_name data_type comment

      zone int
      z int
      year int
      month int

      1. Detailed Table Information
        Database: default
        Owner: patcharee
        CreateTime: Thu Jul 09 16:46:54 CEST 2015
        LastAccessTime: UNKNOWN
        Protect Mode: None
        Retention: 0
        Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D
        Table Type: EXTERNAL_TABLE
        Table Parameters:
        EXTERNAL TRUE
        comment this table is imported from rwf_data//wrf/
        last_modified_by patcharee
        last_modified_time 1439806692
        orc.compress ZLIB
        transient_lastDdlTime 1439806692
      1. Storage Information
        SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
        InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
        OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
        Compressed: No
        Num Buckets: -1
        Bucket Columns: []
        Sort Columns: []
        Storage Desc Params:
        serialization.format 1
        Time taken: 0.388 seconds, Fetched: 58 row(s)

      ================================

      Data was inserted into this table by another spark job>

      df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              patcharee patcharee
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: