[SPARK-14172] Hive table partition predicate not passed down correctly - ASF JIRA

XML

Word

Printable

JSON

When the hive sql contains nondeterministic fields, spark plan will not push down the partition predicate to the HiveTableScan. For example:

-- consider following query which uses a random function to sample rows
SELECT *
FROM table_a
WHERE partition_col = 'some_value'
AND rand() < 0.01;

The spark plan will not push down the partition predicate to HiveTableScan which ends up scanning all partitions data from the table.

duplicates

SPARK-21520 Improvement a special case for non-deterministic projects in optimizer

is duplicated by

SPARK-27969 Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance

links to

[Github] Pull Request #13893 (jiangxb1987)