Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11087

spark.sql.orc.filterPushdown does not work, No ORC pushdown predicate

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.5.1
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      orc file version 0.12 with HIVE_8732
      hive version 1.2.1.2.3.0.0-2557

      Description

      I have an external hive table stored as partitioned orc file (see the table schema below). I tried to query from the table with where clause>

      hiveContext.setConf("spark.sql.orc.filterPushdown", "true")
      hiveContext.sql("select u, v from 4D where zone = 2 and x = 320 and y = 117")).

      But from the log file with debug logging level on, the ORC pushdown predicate was not generated.

      Unfortunately my table was not sorted when I inserted the data, but I expected the ORC pushdown predicate should be generated (because of the where clause) though

      Table schema
      ================================
      hive> describe formatted 4D;
      OK

      1. col_name data_type comment

      date int
      hh int
      x int
      y int
      height float
      u float
      v float
      w float
      ph float
      phb float
      t float
      p float
      pb float
      qvapor float
      qgraup float
      qnice float
      qnrain float
      tke_pbl float
      el_pbl float
      qcloud float

      1. Partition Information
      2. col_name data_type comment

      zone int
      z int
      year int
      month int

      1. Detailed Table Information
        Database: default
        Owner: patcharee
        CreateTime: Thu Jul 09 16:46:54 CEST 2015
        LastAccessTime: UNKNOWN
        Protect Mode: None
        Retention: 0
        Location: hdfs://helmhdfs/apps/hive/warehouse/wrf_tables/4D
        Table Type: EXTERNAL_TABLE
        Table Parameters:
        EXTERNAL TRUE
        comment this table is imported from rwf_data//wrf/
        last_modified_by patcharee
        last_modified_time 1439806692
        orc.compress ZLIB
        transient_lastDdlTime 1439806692
      1. Storage Information
        SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
        InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
        OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
        Compressed: No
        Num Buckets: -1
        Bucket Columns: []
        Sort Columns: []
        Storage Desc Params:
        serialization.format 1
        Time taken: 0.388 seconds, Fetched: 58 row(s)

      ================================

      Data was inserted into this table by another spark job>

      df.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("4D")

        Activity

        Hide
        zzhan Zhan Zhang added a comment -

        no matter whether the table is sorted or not, the predicate pushdown should happen. Need to first add some debug msg on the driver side to make sure it happen.

        Show
        zzhan Zhan Zhang added a comment - no matter whether the table is sorted or not, the predicate pushdown should happen. Need to first add some debug msg on the driver side to make sure it happen.
        Hide
        zzhan Zhan Zhang added a comment -

        I will take a look at this one.

        Show
        zzhan Zhan Zhang added a comment - I will take a look at this one.
        Hide
        zzhan Zhan Zhang added a comment -

        patcharee I tried a simple case with partition and predicate pushdown, and didn't hit the problem. The predicate is pushdown correctly. I will try to use your same table to see whether it works.

        2501 case class Contact(name: String, phone: String)
        2502 case class Person(name: String, age: Int, contacts: Seq[Contact])
        2503 val records = (1 to 100).map { i =>;
        2504 Person(s"name_$i", i, (0 to 1).map

        { m => Contact(s"contact_$m", s"phone_$m") }

        )
        2505 }
        2506 sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        2507 sc.parallelize(records).toDF().write.format("orc").partitionBy("age").save("peoplePartitioned")
        2508 val peoplePartitioned = sqlContext.read.format("orc").load("peoplePartitioned")
        2509 peoplePartitioned.registerTempTable("peoplePartitioned")
        2510 sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count
        2511 :history
        2512 sqlContext.sql("SELECT * FROM peoplePartitioned WHERE name = 'name_20' and age = 20").count
        2513 :history

        scala>

        2015-10-15 10:40:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (LESS_THAN age 15)
        expr = leaf-0

        2015-10-15 10:48:20 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS name name_20)
        expr = leaf-0

        sqlContext.sql("SELECT name FROM people WHERE age == 15 and age < 16").count()

        2015-10-15 10:58:35 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS age 15)
        leaf-1 = (LESS_THAN age 16)

        sqlContext.sql("SELECT name FROM people WHERE age < 15").count()

        Show
        zzhan Zhan Zhang added a comment - patcharee I tried a simple case with partition and predicate pushdown, and didn't hit the problem. The predicate is pushdown correctly. I will try to use your same table to see whether it works. 2501 case class Contact(name: String, phone: String) 2502 case class Person(name: String, age: Int, contacts: Seq [Contact] ) 2503 val records = (1 to 100).map { i =>; 2504 Person(s"name_$i", i, (0 to 1).map { m => Contact(s"contact_$m", s"phone_$m") } ) 2505 } 2506 sqlContext.setConf("spark.sql.orc.filterPushdown", "true") 2507 sc.parallelize(records).toDF().write.format("orc").partitionBy("age").save("peoplePartitioned") 2508 val peoplePartitioned = sqlContext.read.format("orc").load("peoplePartitioned") 2509 peoplePartitioned.registerTempTable("peoplePartitioned") 2510 sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count 2511 :history 2512 sqlContext.sql("SELECT * FROM peoplePartitioned WHERE name = 'name_20' and age = 20").count 2513 :history scala> 2015-10-15 10:40:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (LESS_THAN age 15) expr = leaf-0 2015-10-15 10:48:20 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS name name_20) expr = leaf-0 sqlContext.sql("SELECT name FROM people WHERE age == 15 and age < 16").count() 2015-10-15 10:58:35 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS age 15) leaf-1 = (LESS_THAN age 16) sqlContext.sql("SELECT name FROM people WHERE age < 15").count()
        Hide
        zzhan Zhan Zhang added a comment - - edited

        patcharee I try to duplicate your table as much as possible, but still didn't hit the problem. Note that the query has to include some valid record in the partition. Otherwise, the partition pruning will trim all predicate before hitting the orc scan. Please refer to the below for the details.

        case class record(date: Int, hh: Int, x: Int, y: Int, height: Float, u: Float, w: Float, ph: Float, phb: Float, t: Float, p: Float, pb: Float, tke_pbl: Float, el_pbl: Float, qcloud: Float, zone: Int, z: Int, year: Int, month: Int)

        val records = (1 to 100).map

        { i => record(i.toInt, i.toInt, i.toInt, i.toInt, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toInt, i.toInt, i.toInt, i.toInt) }

        sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("5D")
        sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").partitionBy("zone","z","year","month").save("4D")
        val test = sqlContext.read.format("orc").load("4D")
        test.registerTempTable("4D")
        sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        sqlContext.sql("select date, month, year, hh, u*0.9122461, u*-0.40964267, z from 4D where x = and y = 117 and zone == 2 and year=2 and z >= 2 and z <= 8").show

        2015-10-15 13:37:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS x 320)
        leaf-1 = (EQUALS y 117)
        expr = (and leaf-0 leaf-1)
        2507 sqlContext.sql("select date, month, year, hh, u*0.9122461, u*-0.40964267, z from 5D where x = 321 and y = 118 and zone == 2 and year=2 and z >= 2 and z <= 8").show
        2015-10-15 13:40:06 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS x 321)
        leaf-1 = (EQUALS y 118)
        expr = (and leaf-0 leaf-1)

        Show
        zzhan Zhan Zhang added a comment - - edited patcharee I try to duplicate your table as much as possible, but still didn't hit the problem. Note that the query has to include some valid record in the partition. Otherwise, the partition pruning will trim all predicate before hitting the orc scan. Please refer to the below for the details. case class record(date: Int, hh: Int, x: Int, y: Int, height: Float, u: Float, w: Float, ph: Float, phb: Float, t: Float, p: Float, pb: Float, tke_pbl: Float, el_pbl: Float, qcloud: Float, zone: Int, z: Int, year: Int, month: Int) val records = (1 to 100).map { i => record(i.toInt, i.toInt, i.toInt, i.toInt, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toFloat, i.toInt, i.toInt, i.toInt, i.toInt) } sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("zone","z","year","month").saveAsTable("5D") sc.parallelize(records).toDF().write.format("org.apache.spark.sql.hive.orc.DefaultSource").partitionBy("zone","z","year","month").save("4D") val test = sqlContext.read.format("orc").load("4D") test.registerTempTable("4D") sqlContext.setConf("spark.sql.orc.filterPushdown", "true") sqlContext.setConf("spark.sql.orc.filterPushdown", "true") sqlContext.sql("select date, month, year, hh, u*0.9122461, u*-0.40964267, z from 4D where x = and y = 117 and zone == 2 and year=2 and z >= 2 and z <= 8").show 2015-10-15 13:37:45 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS x 320) leaf-1 = (EQUALS y 117) expr = (and leaf-0 leaf-1) 2507 sqlContext.sql("select date, month, year, hh, u*0.9122461, u*-0.40964267, z from 5D where x = 321 and y = 118 and zone == 2 and year=2 and z >= 2 and z <= 8").show 2015-10-15 13:40:06 OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS x 321) leaf-1 = (EQUALS y 118) expr = (and leaf-0 leaf-1)
        Hide
        patcharee patcharee added a comment - - edited

        Zhan Zhang

        Below is my test. Please check. I tried to change "hive.exec.orc.split.strategy" also, but none of them given " OrcInputFormat [INFO] ORC pushdown predicate" as same as your result

        2508 case class Contact(name: String, phone: String)
        2509 case class Person(name: String, age: Int, contacts: Seq[Contact])
        2510 val records = (1 to 100).map { i => Person(s"name_$i", i, (0 to 1).map

        { m => Contact(s"contact_$m", s"phone_$m") }

        )
        2511 }
        2512 sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        2513 sc.parallelize(records).toDF().write.format("orc").partitionBy("age").save("peoplePartitioned")
        2514 val peoplePartitioned = sqlContext.read.format("orc").load("peoplePartitioned")
        2515 peoplePartitioned.registerTempTable("peoplePartitioned")

        scala> sqlContext.setConf("hive.exec.orc.split.strategy", "ETL")
        15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL
        15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL
        15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL
        15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL

        scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count
        15/10/16 09:10:52 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:10:52 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:10:53 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:10:53 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:10:53 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:10:53 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:10:53 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:10:53 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:10:53 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:10:53 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:10:53 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:10:53 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:10:53 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:10:53 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:10:53 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979453032 end=1444979453038 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:10:53 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979453032 end=1444979453038 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        res5: Long = 1

        scala> sqlContext.setConf("hive.exec.orc.split.strategy", "BI")
        15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI
        15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI
        15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI
        15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI

        scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count
        15/10/16 09:11:19 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:11:19 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:11:19 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:19 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:19 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:11:19 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:11:19 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:11:19 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:11:19 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:11:19 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:11:19 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:11:19 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:11:19 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:11:19 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:11:19 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979479831 end=1444979479846 duration=15 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:19 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979479831 end=1444979479846 duration=15 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        res7: Long = 1

        scala> sqlContext.setConf("hive.exec.orc.split.strategy", "HYBRID")
        15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID
        15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID
        15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID
        15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID

        scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count
        15/10/16 09:11:29 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:11:29 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'
        15/10/16 09:11:29 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:29 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:29 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:11:29 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0
        15/10/16 09:11:29 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:11:29 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0
        15/10/16 09:11:29 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:11:29 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc
        15/10/16 09:11:29 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:11:29 INFO OrcInputFormat: FooterCacheHitRatio: 0/0
        15/10/16 09:11:29 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:11:29 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1
        15/10/16 09:11:29 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979489785 end=1444979489789 duration=4 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        15/10/16 09:11:29 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979489785 end=1444979489789 duration=4 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
        res9: Long = 1

        Show
        patcharee patcharee added a comment - - edited Zhan Zhang Below is my test. Please check. I tried to change "hive.exec.orc.split.strategy" also, but none of them given " OrcInputFormat [INFO] ORC pushdown predicate" as same as your result 2508 case class Contact(name: String, phone: String) 2509 case class Person(name: String, age: Int, contacts: Seq [Contact] ) 2510 val records = (1 to 100).map { i => Person(s"name_$i", i, (0 to 1).map { m => Contact(s"contact_$m", s"phone_$m") } ) 2511 } 2512 sqlContext.setConf("spark.sql.orc.filterPushdown", "true") 2513 sc.parallelize(records).toDF().write.format("orc").partitionBy("age").save("peoplePartitioned") 2514 val peoplePartitioned = sqlContext.read.format("orc").load("peoplePartitioned") 2515 peoplePartitioned.registerTempTable("peoplePartitioned") scala> sqlContext.setConf("hive.exec.orc.split.strategy", "ETL") 15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL 15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL 15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL 15/10/16 09:10:49 DEBUG VariableSubstitution: Substitution is on: ETL scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count 15/10/16 09:10:52 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:10:52 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:10:53 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:10:53 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:10:53 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:10:53 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:10:53 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:10:53 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:10:53 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:10:53 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:10:53 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:10:53 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:10:53 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:10:53 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:10:53 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979453032 end=1444979453038 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:10:53 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979453032 end=1444979453038 duration=6 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> res5: Long = 1 scala> sqlContext.setConf("hive.exec.orc.split.strategy", "BI") 15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI 15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI 15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI 15/10/16 09:11:13 DEBUG VariableSubstitution: Substitution is on: BI scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count 15/10/16 09:11:19 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:11:19 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:11:19 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:19 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:19 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:11:19 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:11:19 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:11:19 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:11:19 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:11:19 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:11:19 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:11:19 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:11:19 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:11:19 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:11:19 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979479831 end=1444979479846 duration=15 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:19 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979479831 end=1444979479846 duration=15 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> res7: Long = 1 scala> sqlContext.setConf("hive.exec.orc.split.strategy", "HYBRID") 15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID 15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID 15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID 15/10/16 09:11:27 DEBUG VariableSubstitution: Substitution is on: HYBRID scala> sqlContext.sql("SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20'").count 15/10/16 09:11:29 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:11:29 DEBUG VariableSubstitution: Substitution is on: SELECT * FROM peoplePartitioned WHERE age = 20 and name = 'name_20' 15/10/16 09:11:29 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:29 INFO PerfLogger: <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:29 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:11:29 DEBUG OrcInputFormat: Number of buckets specified by conf file is 0 15/10/16 09:11:29 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:11:29 DEBUG AcidUtils: in directory hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc base = null deltas = 0 15/10/16 09:11:29 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:11:29 DEBUG OrcInputFormat: BISplitStrategy strategy for hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc 15/10/16 09:11:29 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:11:29 INFO OrcInputFormat: FooterCacheHitRatio: 0/0 15/10/16 09:11:29 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:11:29 DEBUG OrcInputFormat: hdfs://helmhdfs/user/patcharee/peoplePartitioned/age=20/part-r-00014-fb3d0874-db8b-40e7-9a4f-0e071c46f509.orc:0+551 projected_columns_uncompressed_size: -1 15/10/16 09:11:29 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979489785 end=1444979489789 duration=4 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 15/10/16 09:11:29 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1444979489785 end=1444979489789 duration=4 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> res9: Long = 1
        Hide
        zzhan Zhan Zhang added a comment -

        patcharee I tried again, used the step you provided, and could see the OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS name name_20). I am using master branch.

        Show
        zzhan Zhan Zhang added a comment - patcharee I tried again, used the step you provided, and could see the OrcInputFormat [INFO] ORC pushdown predicate: leaf-0 = (EQUALS name name_20). I am using master branch.
        Hide
        patcharee patcharee added a comment -

        Hi Zhan Zhang

        What version of hive and orc file you are using? Can I get your hive configuration file?

        Show
        patcharee patcharee added a comment - Hi Zhan Zhang What version of hive and orc file you are using? Can I get your hive configuration file?
        Hide
        zzhan Zhan Zhang added a comment -

        patcharee I use the embeded hive metastore without any configuration. If you ran it in real cluster, you have to collect the log from executor, because I think the log is printed inside of executors.

        Show
        zzhan Zhan Zhang added a comment - patcharee I use the embeded hive metastore without any configuration. If you ran it in real cluster, you have to collect the log from executor, because I think the log is printed inside of executors.
        Hide
        patcharee patcharee added a comment -

        Zhan Zhang I found the predicate generated in the executor log for the case using dataframe (not hiveContext.sql). Sorry for my mistake, and thanks for your help!

        Show
        patcharee patcharee added a comment - Zhan Zhang I found the predicate generated in the executor log for the case using dataframe (not hiveContext.sql). Sorry for my mistake, and thanks for your help!
        Hide
        patcharee patcharee added a comment -

        The predicate is indeed generated and can be found in the executor log

        Show
        patcharee patcharee added a comment - The predicate is indeed generated and can be found in the executor log
        Hide
        patcharee patcharee added a comment - - edited

        Hi, I found a scenario where the predicate does not work again. Zhan Zhang Can you please have a look?

        First create a hive table >>
        hive> create table people(name string, address string, phone string) partitioned by(age int) stored as orc;

        Then use spark shell local mode to insert data and then query >>
        120 import org.apache.spark.sql.Row
        121 import org.apache.spark.

        {SparkConf, SparkContext}

        122 import org.apache.spark.sql.types._
        123 import org.apache.spark.sql.types.

        {StructType,StructField,StringType,IntegerType,FloatType}

        124 sqlContext.setConf("hive.exec.dynamic.partition.mode","nonstrict")
        125 val records = (1 to 10).map( i => Row(s"name_$i", s"address_$i", s"phone_$i", i ))
        126 val schemaString = "name address phone age"
        127 val schema = StructType(schemaString.split(" ").map(fieldName => if (fieldName.equals("age")) StructField(fieldName, IntegerType, true) else StructField(fieldName, StringType, true)))
        128 val x = sc.parallelize(records)
        129 val rDF = sqlContext.createDataFrame(x, schema)
        130 rDF.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("age").saveAsTable("people")
        131 sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        132 val people = sqlContext.read.format("orc").load("/user/hive/warehouse/people")
        133 people.registerTempTable("people")
        134 sqlContext.sql("SELECT * FROM people WHERE age = 3 and name = 'name_3'").count

        Below is a part of the log message from the last command >>
        15/11/06 15:40:36 INFO HadoopRDD: Input split: hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000:0+453
        15/11/06 15:40:36 DEBUG OrcInputFormat: No ORC pushdown predicate
        15/11/06 15:40:36 DEBUG OrcInputFormat: No ORC pushdown predicate
        15/11/06 15:40:36 INFO OrcRawRecordMerger: min key = null, max key = null
        15/11/06 15:40:36 INFO OrcRawRecordMerger: min key = null, max key = null
        15/11/06 15:40:36 INFO ReaderImpl: Reading ORC rows from hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000 with

        {include: [true, true, false, false], offset: 0, length: 9223372036854775807}

        15/11/06 15:40:36 INFO ReaderImpl: Reading ORC rows from hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000 with

        {include: [true, true, false, false], offset: 0, length: 9223372036854775807}

        15/11/06 15:40:36 INFO RecordReaderFactory: Schema is not specified on read. Using file schema.
        15/11/06 15:40:36 INFO RecordReaderFactory: Schema is not specified on read. Using file schema.
        15/11/06 15:40:36 DEBUG RecordReaderImpl: chunks = [range start: 111 end: 126]
        15/11/06 15:40:36 DEBUG RecordReaderImpl: chunks = [range start: 111 end: 126]
        15/11/06 15:40:36 DEBUG RecordReaderImpl: merge = [data range [111, 126), size: 15 type: array-backed]
        15/11/06 15:40:36 DEBUG RecordReaderImpl: merge = [data range [111, 126), size: 15 type: array-backed]
        15/11/06 15:40:36 INFO GeneratePredicate: Code generated in 5.063287 ms

        Show
        patcharee patcharee added a comment - - edited Hi, I found a scenario where the predicate does not work again. Zhan Zhang Can you please have a look? First create a hive table >> hive> create table people(name string, address string, phone string) partitioned by(age int) stored as orc; Then use spark shell local mode to insert data and then query >> 120 import org.apache.spark.sql.Row 121 import org.apache.spark. {SparkConf, SparkContext} 122 import org.apache.spark.sql.types._ 123 import org.apache.spark.sql.types. {StructType,StructField,StringType,IntegerType,FloatType} 124 sqlContext.setConf("hive.exec.dynamic.partition.mode","nonstrict") 125 val records = (1 to 10).map( i => Row(s"name_$i", s"address_$i", s"phone_$i", i )) 126 val schemaString = "name address phone age" 127 val schema = StructType(schemaString.split(" ").map(fieldName => if (fieldName.equals("age")) StructField(fieldName, IntegerType, true) else StructField(fieldName, StringType, true))) 128 val x = sc.parallelize(records) 129 val rDF = sqlContext.createDataFrame(x, schema) 130 rDF.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("age").saveAsTable("people") 131 sqlContext.setConf("spark.sql.orc.filterPushdown", "true") 132 val people = sqlContext.read.format("orc").load("/user/hive/warehouse/people") 133 people.registerTempTable("people") 134 sqlContext.sql("SELECT * FROM people WHERE age = 3 and name = 'name_3'").count Below is a part of the log message from the last command >> 15/11/06 15:40:36 INFO HadoopRDD: Input split: hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000:0+453 15/11/06 15:40:36 DEBUG OrcInputFormat: No ORC pushdown predicate 15/11/06 15:40:36 DEBUG OrcInputFormat: No ORC pushdown predicate 15/11/06 15:40:36 INFO OrcRawRecordMerger: min key = null, max key = null 15/11/06 15:40:36 INFO OrcRawRecordMerger: min key = null, max key = null 15/11/06 15:40:36 INFO ReaderImpl: Reading ORC rows from hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000 with {include: [true, true, false, false], offset: 0, length: 9223372036854775807} 15/11/06 15:40:36 INFO ReaderImpl: Reading ORC rows from hdfs://localhost:9000/user/hive/warehouse/people/age=3/part-00000 with {include: [true, true, false, false], offset: 0, length: 9223372036854775807} 15/11/06 15:40:36 INFO RecordReaderFactory: Schema is not specified on read. Using file schema. 15/11/06 15:40:36 INFO RecordReaderFactory: Schema is not specified on read. Using file schema. 15/11/06 15:40:36 DEBUG RecordReaderImpl: chunks = [range start: 111 end: 126] 15/11/06 15:40:36 DEBUG RecordReaderImpl: chunks = [range start: 111 end: 126] 15/11/06 15:40:36 DEBUG RecordReaderImpl: merge = [data range [111, 126), size: 15 type: array-backed] 15/11/06 15:40:36 DEBUG RecordReaderImpl: merge = [data range [111, 126), size: 15 type: array-backed] 15/11/06 15:40:36 INFO GeneratePredicate: Code generated in 5.063287 ms
        Hide
        smilegator Xiao Li added a comment -

        Can you retry it using the latest master/2.0.1 branch? Thanks!

        Show
        smilegator Xiao Li added a comment - Can you retry it using the latest master/2.0.1 branch? Thanks!
        Hide
        hyukjin.kwon Hyukjin Kwon added a comment -
        hive> create table people(name string, address string, phone string) partitioned by(age int) stored as orc;
        OK
        Time taken: 4.609 seconds
        
        import org.apache.spark.sql.Row
        import org.apache.spark.{SparkConf, SparkContext}
        import org.apache.spark.sql.types._
        import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType,FloatType}
        val sqlContext = spark.sqlContext
        
        sqlContext.setConf("hive.exec.dynamic.partition.mode","nonstrict")
        val records = (1 to 10).map( i => Row(s"name_$i", s"address_$i", s"phone_$i", i ))
        val schemaString = "name address phone age"
        val schema = StructType(schemaString.split(" ").map(fieldName => if (fieldName.equals("age")) StructField(fieldName, IntegerType, true) else StructField(fieldName, StringType, true)))
        val x = sc.parallelize(records)
        val rDF = sqlContext.createDataFrame(x, schema)
        rDF.write.format("org.apache.spark.sql.hive.orc.DefaultSource").mode(org.apache.spark.sql.SaveMode.Append).partitionBy("age").saveAsTable("people")
        sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
        val people = sqlContext.read.format("orc").load("spark-warehouse/people")
        people.registerTempTable("people")
        sqlContext.sql("SELECT * FROM people WHERE age = 3 and name = 'name_3'").explain(true)
        
        
        == Parsed Logical Plan ==
        'Project [*]
        +- 'Filter (('age = 3) && ('name = name_3))
           +- 'UnresolvedRelation `people`
        
        == Analyzed Logical Plan ==
        name: string, address: string, phone: string, age: int
        Project [name#68, address#69, phone#70, age#71]
        +- Filter ((age#71 = 3) && (name#68 = name_3))
           +- SubqueryAlias people
              +- Relation[name#68,address#69,phone#70,age#71] orc
        
        == Optimized Logical Plan ==
        Filter (((isnotnull(age#71) && isnotnull(name#68)) && (age#71 = 3)) && (name#68 = name_3))
        +- Relation[name#68,address#69,phone#70,age#71] orc
        
        == Physical Plan ==
        *Project [name#68, address#69, phone#70, age#71]
        +- *Filter (isnotnull(name#68) && (name#68 = name_3))
           +- *FileScan orc [name#68,address#69,phone#70,age#71] Batched: false, Format: ORC, Location: InMemoryFileIndex[..., PartitionCount: 1, PartitionFilters: [isnotnull(age#71), (age#71 = 3)], PushedFilters: [IsNotNull(name), EqualTo(name,name_3)], ReadSchema: struct<name:string,address:string,phone:string>
        
        

        I see it is pushed down in `PushedFilters`. I am resolving this.

        Show
        hyukjin.kwon Hyukjin Kwon added a comment - hive> create table people(name string, address string, phone string) partitioned by(age int ) stored as orc; OK Time taken: 4.609 seconds import org.apache.spark.sql.Row import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.types._ import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType,FloatType} val sqlContext = spark.sqlContext sqlContext.setConf( "hive.exec.dynamic.partition.mode" , "nonstrict" ) val records = (1 to 10).map( i => Row(s "name_$i" , s "address_$i" , s "phone_$i" , i )) val schemaString = "name address phone age" val schema = StructType(schemaString.split( " " ).map(fieldName => if (fieldName.equals( "age" )) StructField(fieldName, IntegerType, true ) else StructField(fieldName, StringType, true ))) val x = sc.parallelize(records) val rDF = sqlContext.createDataFrame(x, schema) rDF.write.format( "org.apache.spark.sql.hive.orc.DefaultSource" ).mode(org.apache.spark.sql.SaveMode.Append).partitionBy( "age" ).saveAsTable( "people" ) sqlContext.setConf( "spark.sql.orc.filterPushdown" , " true " ) val people = sqlContext.read.format( "orc" ).load( "spark-warehouse/people" ) people.registerTempTable( "people" ) sqlContext.sql( "SELECT * FROM people WHERE age = 3 and name = 'name_3'" ).explain( true ) == Parsed Logical Plan == 'Project [*] +- 'Filter (('age = 3) && ('name = name_3)) +- 'UnresolvedRelation `people` == Analyzed Logical Plan == name: string, address: string, phone: string, age: int Project [name#68, address#69, phone#70, age#71] +- Filter ((age#71 = 3) && (name#68 = name_3)) +- SubqueryAlias people +- Relation[name#68,address#69,phone#70,age#71] orc == Optimized Logical Plan == Filter (((isnotnull(age#71) && isnotnull(name#68)) && (age#71 = 3)) && (name#68 = name_3)) +- Relation[name#68,address#69,phone#70,age#71] orc == Physical Plan == *Project [name#68, address#69, phone#70, age#71] +- *Filter (isnotnull(name#68) && (name#68 = name_3)) +- *FileScan orc [name#68,address#69,phone#70,age#71] Batched: false , Format: ORC, Location: InMemoryFileIndex[..., PartitionCount: 1, PartitionFilters: [isnotnull(age#71), (age#71 = 3)], PushedFilters: [IsNotNull(name), EqualTo(name,name_3)], ReadSchema: struct<name:string,address:string,phone:string> I see it is pushed down in `PushedFilters`. I am resolving this.

          People

          • Assignee:
            Unassigned
            Reporter:
            patcharee patcharee
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development