[HIVE-8017] Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
- Spark-M1

Description

HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use HiveKey.hashCode for more complicated ones, e.g. join, bucketed table, etc.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-8017-spark.patch
08/Sep/14 06:39
31 kB
Rui Li
HIVE-8017.5-spark.patch
12/Sep/14 05:59
76 kB
Rui Li
HIVE-8017.4-spark.patch
11/Sep/14 02:08
75 kB
Rui Li
HIVE-8017.3-spark.patch
10/Sep/14 03:54
97 kB
Rui Li
HIVE-8017.2-spark.patch
09/Sep/14 09:16
76 kB
Rui Li

Issue Links

is depended upon by

HIVE-7856 Enable parallelism in Reduce Side Join [Spark Branch]

Resolved

HIVE-7956 When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]

Resolved

is related to

HIVE-8098 The spark golden file for union_remove_25 is different from MR version [Spark Branch]

Open

relates to

HIVE-8035 Add SORT_QUERY_RESULTS for test that doesn't guarantee order

Closed

links to

RB request

Activity

People

Assignee:: Rui Li

Reporter:: Rui Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Sep/14 02:24

Updated:: 29/May/15 02:28

Resolved:: 12/Sep/14 16:23