[PIG-5024] add a physical operator to broadcast small RDDs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: spark-branch
Component/s: spark
Labels:
None

Description

Currently, when optimize some kinds of JOIN, the indexed or sampling files are saved into HDFS. By setting the replication to a larger number, it serves as distributed cache.

Spark's broadcast mechanism is suitable for this. It seems that we can add a physical operator to broadcast small RDDs.
This will benefit the optimization of some specialized Joins, such as Skewed Join, Replicated Join and so on.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-5024.patch
07/Sep/16 06:53
43 kB
Xianda Ke
PIG-5024_6.patch
09/Sep/16 04:45
9 kB
Xianda Ke
PIG-5024_5.patch
09/Sep/16 03:30
9 kB
Xianda Ke
PIG-5024_4.patch
09/Sep/16 03:23
9 kB
Xianda Ke
PIG-5024_3.patch
08/Sep/16 13:47
9 kB
Xianda Ke
PIG-5024_2.patch
08/Sep/16 05:56
9 kB
Xianda Ke

Issue Links

links to

Activity

People

Assignee:: Xianda Ke

Reporter:: Xianda Ke

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Sep/16 12:43

Updated:: 21/Jun/17 09:18

Resolved:: 09/Sep/16 04:51