Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
It will throw exception when spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly is disabled:
select catalog_sales.* from catalog_sales join catalog_returns where cr_order_number = cs_sold_date_sk and cr_returned_time_sk < 40000;
20/08/16 06:44:42 ERROR TaskSetManager: Total size of serialized results of 494 tasks (1225.3 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB)
We can improve it with minimum, maximum and Bloom filter to reduce serialized results.
Attachments
Issue Links
- is related to
-
SPARK-34562 Leverage parquet bloom filters
- Resolved
- links to