[SPARK-11512] Bucket Join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Target Version/s:

2.0.0

Description

Sort merge join on two datasets on the file system that have already been partitioned the same with the same number of partitions and sorted within each partition, and we don't need to sort it again while join with the sorted/partitioned keys

This functionality exists in

Hive (hive.optimize.bucketmapjoin.sortedmerge)
Pig (USING 'merge')
MapReduce (CompositeInputFormat)

Attachments

Issue Links

contains

SPARK-5292 optimize join for table that are already sharded/support for hive bucket

Closed

duplicates

SPARK-12394 Support writing out pre-hash-partitioned data and exploit that in join optimizations to avoid shuffle (i.e. bucketing in Hive)

Resolved

Activity

People

Assignee:: Wenchen Fan

Reporter:: Cheng Hao

Votes:: 6 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 05/Nov/15 00:51

Updated:: 16/Jan/16 01:21

Resolved:: 17/Dec/15 06:35