[HIVE-8699] Enable support for common map join [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
None

Description

This JIRA is to track issues related to common map-join support in Spark, including logical and physical optimizations. ~~HIVE-8616~~ provided initialial processing, mainly represented by SparkMapJoinOptimizer. We need to continue the work to make map join work from end to end, including enhancement needed for SparkMapJoinOptimizer and subsequent physical optimization SparkMapJoinResolver.

Attachments

Issue Links

is part of

HIVE-7292 Hive on Spark

Resolved

is related to

HIVE-8616 convert joinOp to MapJoinOp and generate MapWorks only [Spark Branch]

Resolved

HIVE-7613 Research optimization of auto convert join to map join [Spark branch]

Resolved

Sub-Tasks

1.	Implement HashTableLoader for Spark map-join [Spark Branch]	Resolved	Jimmy Xiang
2.	Dump small table join data for map-join [Spark Branch]	Resolved	Jimmy Xiang
3.	Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]	Resolved	Chao Sun
4.	Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]	Resolved	Suhas Satish
5.	Combine nested map joins into the parent map join if possible [Spark Branch]	Resolved	Szehon Ho
6.	Extra MapTask created but not connected [Spark Branch]	Resolved	Szehon Ho
7.	Refactoring: move mapLocalWork field from MapWork to BaseWork	Resolved	Xuefu Zhang
8.	Generate MapredLocalWork in SparkMapJoinResolver [Spark Brach]	Resolved	Chao Sun
9.	Refactor to make splitting SparkWork a physical resolver [Spark Branch]	Resolved	Rui Li
10.	Make HashTableSinkOperator works for Spark Branch [Spark Branch]	Resolved	Jimmy Xiang
11.	Make RDD caching work for multi-insert after HIVE-8793 when map join is involved [Spark Branch]	Resolved	Rui Li
12.	auto_join2.q produces incorrect tree [Spark Branch]	Resolved	Chao Sun
13.	Broadcast files for small tables via SparkContext.addFile() and SparkFiles.get() [Spark Branch]	Open	Jimmy Xiang
14.	ColumnStatsTask fails because of SparkMapJoinResolver [Spark Branch]	Resolved	Chao Sun
15.	Populate ExecMapperContext in SparkReduceRecordHandler [Spark Branch]	Resolved	Chao Sun
16.	Needs to set hashTableMemoryUsage for MapJoinDesc [Spark Branch]	Resolved	Chao Sun
17.	Investigate test failure on mapjoin_filter_on_outerjoin.q [Spark Branch]	Resolved	Chao Sun
18.	Investigate test failures on auto_join30.q [Spark Branch]	Resolved	Chao Sun
19.	Investigate test failure on auto_join22.q [Spark Branch]	Resolved	Unassigned
20.	Investigate test failure on auto_join13.q [Spark Branch]	Resolved	Unassigned
21.	Investigate test failures on auto_join6, auto_join7, auto_join18, auto_join18_multi_distinct [Spark Branch]	Resolved	Chao Sun
22.	Investigate test failure on join34.q [Spark Branch]	Resolved	Chao Sun
23.	Enable mapjoin hints [Spark Branch]	Resolved	Chao Sun
24.	Enable non-staged mapjoin [Spark Branch]	Open	Unassigned
25.	Investigate test failure on auto_join2.q [Spark Branch]	Resolved	Chao Sun
26.	Investigate test failure for join_empty.q [Spark Branch]	Resolved	Szehon Ho
27.	Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]	Resolved	Chao Sun
28.	Add SORT_QUERY_RESULTS for join tests that do not guarantee order	Resolved	Chao Sun
29.	Investigate test failure on skewjoin.q [Spark Branch]	Resolved	Chao Sun
30.	Fix memory limit check for combine nested mapjoins [Spark Branch]	Resolved	Szehon Ho
31.	Enable Map Join [Spark Branch]	Resolved	Chao Sun
32.	Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2	Resolved	Chao Sun
33.	Investigate test failure on bucketmapjoin7.q [Spark Branch]	Resolved	Jimmy Xiang
34.	Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]	Resolved	Chao Sun
35.	Investigate mapjoin_mapjoin.q failure [Spark Branch]	Resolved	Unassigned
36.	IndexOutOfBounds exception in mapjoin [Spark Branch]	Resolved	Chao Sun
37.	Not a directory error in mapjoin_hook.q [Spark Branch]	Resolved	Chao Sun
38.	Fix bucket related test failure: parquet_join.q [Spark Branch]	Resolved	Jimmy Xiang
39.	Hive may generate wrong plan for map join queries due to IdentityProjectRemover [Spark Branch]	Resolved	Szehon Ho
40.	Union input to a join operator poses problem when converting to map join [Spark Branch]	Open	wangwenli

Activity

People

Assignee:: Unassigned

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 02/Nov/14 14:59

Updated:: 04/Nov/14 20:00