-
Type:
Improvement
-
Status: Open
-
Priority:
Minor
-
Resolution: Unresolved
-
Affects Version/s: 3.1.0
-
Fix Version/s: None
-
Component/s: Input/Output, Java API, SQL
-
Labels:None
As a DataFrame author, I can elect to bucketize my output without involving Hive or HMS, so that my hive-less environment can benefit from this query-optimization technique.
https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=16345397&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16345397 identifies this as a shortcoming with the umbrella feature in provided via SPARK-19256.
In short, relying on Hive to store metadata precludes environments which don't have/use hive from making use of bucketization features.
- relates to
-
SPARK-26160 Make assertNotBucketed call in DataFrameWriter::save optional
-
- Resolved
-