[HIVE-7526] Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
None

Description

Currently SparkClient shuffles data by calling paritionByKey(). This transformation outputs <key, value> tuples. However, Hive's ExecMapper expects <key, iterator<value>> tuples, and Spark's groupByKey() seems outputing this directly. Thus, using groupByKey, we may be able to avoid its own key clustering mechanism (in HiveReduceFunction). This research is to have a try.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7526.2.patch
30/Jul/14 04:32
3 kB
Chao Sun
HIVE-7526.3.patch
30/Jul/14 19:29
11 kB
Chao Sun
HIVE-7526.4-spark.patch
31/Jul/14 05:14
11 kB
Chao Sun
HIVE-7526.5-spark.patch
31/Jul/14 08:04
13 kB
Xuefu Zhang
HIVE-7526.patch
28/Jul/14 22:33
4 kB
Chao Sun

Issue Links

is part of

HIVE-7292 Hive on Spark

Resolved

is related to

HIVE-7493 Enhance HiveReduceFunction's row clustering [Spark Branch]

Resolved

Activity

People

Assignee:: Chao Sun

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Jul/14 20:17

Updated:: 29/May/15 02:28

Resolved:: 31/Jul/14 08:24