Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Incomplete
-
1.6.1
-
None
Description
When the hive sql contains nondeterministic fields, spark plan will not push down the partition predicate to the HiveTableScan. For example:
-- consider following query which uses a random function to sample rows
SELECT *
FROM table_a
WHERE partition_col = 'some_value'
AND rand() < 0.01;
The spark plan will not push down the partition predicate to HiveTableScan which ends up scanning all partitions data from the table.
Attachments
Issue Links
- duplicates
-
SPARK-21520 Improvement a special case for non-deterministic projects in optimizer
- Resolved
- is duplicated by
-
SPARK-27969 Non-deterministic expressions in filters or projects can unnecessarily prevent all scan-time column pruning, harming performance
- Resolved
- links to