Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7.1
-
None
-
None
-
None
Description
At the time of writing, Spark is not able to properly optimize joins on Kudu tables because Kudu does not provide statistics for Spark to use to determine the optimal join strategy.
It would be a big improvement to find some way to help Spark optimize joins between Kudu tables or between Kudu tables and Parquet-on-HDFS tables.