[SPARK-15867] Use bucket files for TABLESAMPLE BUCKET - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 1.6.0, 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

SELECT * FROM boxes TABLESAMPLE (BUCKET 3 OUT OF 16)

In Hive, this would select the 3rd bucket out of every 16 buckets there are in the table. E.g. if the table was clustered by 32 buckets then this would sample the 3rd and the 19th bucket. (See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling)

In Spark, however, we simply sample 3/16 of the number of input rows.

Either we don't support it in Spark or do it in a way that's consistent with Hive.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Or

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Jun/16 07:43

Updated:: 08/Oct/19 05:41

Resolved:: 08/Oct/19 05:41