Details
Description
Hive predicate push down with Parquet format for partitioned column with column name as keyword -> `date` produces empty result set.
If any of the followings configs is set to false, then the select query returns results.
hive.optimize.ppd.storage, hive.optimize.ppd , hive.optimize.index.filter .
Repro steps:
--------------
1.
1) Create an external partitioned table in Hive
CREATE EXTERNAL TABLE `test_table3`(`id` string) PARTITIONED BY (`date` string) STORED AS parquet;
2) In spark-shell create data frame and write the data parquet file
import java.sql.Timestamp
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import spark.implicits._
val someDF = Seq(("1", "05172021"),("2", "05172021"), ("3", "06182021"), ("4", "07192021")).toDF("id", "date")
someDF.write.mode("overwrite").parquet("<prefix path>/hive/warehouse/external/test_table3/date=05172021")
3) In Hive change the permissions and add partition to the table
$> hdfs dfs -chmod -R 777 <prefix path>/hive/warehouse/external/test_table3
Hive Beeline ->
ALTER TABLE test_table3 ADD PARTITION(`date`='05172021') LOCATION '<prefix path>/hive/warehouse/external/test_table3/date=05172021'
4) SELECT * FROM test_table3; <----- produces all rows
SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces no rows
SET hive.optimize.ppd.storage=false; <--- turn off ppd push down optimization
SELECT * FROM test_table3 WHERE `date`='05172021'; <--- produces rows after setting above config to false
Attaching parquet data files for reference: