[HIVE-10302] Load small tables (for map join) in executor memory only once [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0, 2.0.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

10302.patch
01/Jun/15 21:15
9 kB
Xuefu Zhang
HIVE-10302.2-spark.patch
23/Apr/15 17:47
8 kB
Jimmy Xiang
HIVE-10302.3-spark.patch
23/Apr/15 23:22
7 kB
Jimmy Xiang
HIVE-10302.4-spark.patch
24/Apr/15 18:36
9 kB
Jimmy Xiang
HIVE-10302.spark-1.patch
16/Apr/15 00:41
8 kB
Jimmy Xiang

Issue Links

relates to

HIVE-8851 Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]

Open

Activity

People

Assignee:: Jimmy Xiang

Reporter:: Jimmy Xiang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Apr/15 17:13

Updated:: 16/Feb/16 23:52

Resolved:: 24/Apr/15 22:07