[HIVE-3934] Put tag in value for join with map reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.11.0
Fix Version/s: None
Component/s: Query Processor, Serializers/Deserializers
Labels:
None

Description

While trying to facilitate hash-based map reduce, I found that for join with map reduce in hive, the tag is appended to the key writable. This is quite a hinder for facilitating other runtime map reduce implementation of map reduce computation model such as hash-based map reduce. For example, whent the tag was in the key, there are some special things must be cared, such as,

1. HiveKey must handles specially for the hash code for properly partition the keys between the reduce.
2. The key in map reduce's view is actually key + tag and which makes map reduce sort a compulsory to satisfy the need of hive to group the key in reduce side. This disables or hinders hash-based map reduce because group by key + tag will make no sense to hive.
3. ExecReducer must check the real key boundary by stripping out the tag for startGroup and endGroup calls to the operator. While without the tag, each reduce call is a natural key boundary.

Considering append the tag as the last byte to the value writable which can avoid all the above things and fit naturually to map reduce computation model.

I see the code in JoinOperator which will generate join results ealier which assumes on the fact that the tag is sorted. This only useful when there are many many rows with the same key in both join tables which is not a compulsory for most cases.

Let's disucss the possibiblity of "tag in value" approach.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Haifeng Chen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Jan/13 02:46

Updated:: 24/Jan/13 02:46

Time Tracking

Estimated:

168h

Remaining:

168h

Logged:

Not Specified