[SPARK-26209] Allow for dataframe bucketization without Hive - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Input/Output, Java API, SQL
Labels:
None

Description

As a DataFrame author, I can elect to bucketize my output without involving Hive or HMS, so that my hive-less environment can benefit from this query-optimization technique.

https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=16345397&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16345397 identifies this as a shortcoming with the umbrella feature in provided via SPARK-19256.

In short, relying on Hive to store metadata precludes environments which don't have/use hive from making use of bucketization features.

Attachments

Issue Links

relates to

SPARK-26160 Make assertNotBucketed call in DataFrameWriter::save optional

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Walt Elder

Votes:: 2 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 28/Nov/18 21:34

Updated:: 20/Dec/22 15:13