[HIVE-8810] Make HashTableSinkOperator works for Spark Branch [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: spark-branch
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
None

Description

In MR, all small tables for a particular MJ operator share the same instance of HashTableSinkOperator, while in Spark branch, each small table corresponds to a different HashTableSinkOperator instance. This difference causes some issues.

For instance, in HashTableSinkOperator#processOp, it uses a tag to look for information in various data structures, such as joinKeys, filterMaps, joinValues, etc. Those data structures stores the information BEFORE it splits the MJ operator with its parents. But, since later on we use separate HashTableSinkOperator for each small table, that information is no longer valid, and thus this method will fail.

This JIRA is to track and solve these related issues.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-8810.1-spark.patch
12/Nov/14 22:24
33 kB
Jimmy Xiang
HIVE-8810.2-spark.patch
13/Nov/14 00:07
24 kB
Jimmy Xiang
HIVE-8810.3-spark.patch
13/Nov/14 19:14
20 kB
Jimmy Xiang

Activity

People

Assignee:: Jimmy Xiang

Reporter:: Chao Sun

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Nov/14 23:31

Updated:: 29/May/15 02:31

Resolved:: 13/Nov/14 23:05