Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25244

Hive predicate pushdown with Parquet format for `date` as partitioned column name produce empty resultset

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0, 3.1.1, 3.1.2
    • Fix Version/s: 3.1.0, 3.1.1, 3.1.2, 3.2.0
    • Component/s: Hive, Parquet
    • Labels:
      None

      Description

      Hive predicate push down with Parquet format for partitioned column with column name as  keyword -> `date` produces empty result set.

      If any of the followings configs is set to false, then the select query returns results.

      hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter .

      Repro steps:

      --------------

      1. 

      1) Create an external partitioned table in Hive

      CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` string) STORED AS parquet;

      2) In spark-shell create data frame and write the data parquet file

      import java.sql.Timestamp

      import org.apache.spark.sql.Row

      import org.apache.spark.sql.types._

      import spark.implicits._

      val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), ("4", "07192021")).toDF("id", "date")

      someDF.write.mode("overwrite").parquet("<prefix path>/hive/warehouse/external/test_table3/date=05172021")

      3) In Hive change the permissions and add partition to the table

      $> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3

      Hive Beeline ->

      ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION  '<prefix path>/hive/warehouse/external/test_table3/date=05172021'

      4) SELECT * FROM test_table3;   <----- produces all rows

      SELECT * FROM test_table3 WHERE `date`='05172021';   <--- produces no rows   

      SET hive.optimize.ppd.storage=false;  <--- turn off ppd push down optimization

      SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after setting above config to false

      Attaching parquet data files for reference:

       

       

       

        

        Attachments

        1. test_table3_data.tar.gz
          0.7 kB
          Aniket Adnaik

          Activity

            People

            • Assignee:
              aniadnaik Aniket Adnaik
              Reporter:
              aniadnaik Aniket Adnaik
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: