Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
As a DataFrame author, I can elect to bucketize my output without involving Hive or HMS, so that my hive-less environment can benefit from this query-optimization technique.
https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=16345397&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16345397 identifies this as a shortcoming with the umbrella feature in provided via SPARK-19256.
In short, relying on Hive to store metadata precludes environments which don't have/use hive from making use of bucketization features.
Attachments
Issue Links
- relates to
-
SPARK-26160 Make assertNotBucketed call in DataFrameWriter::save optional
- Resolved