[HIVE-15104] Hive on Spark generate more shuffle data than hive on mr - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.1
Fix Version/s: 3.0.0
Component/s: Spark
Labels:
None

Description

the same sql, running on spark and mr engine, will generate different size of shuffle data.

i think it is because of hive on mr just serialize part of HiveKey, but hive on spark which using kryo will serialize full of Hivekey object.

what is your opionion?

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

TPC-H 100G.xlsx
20/May/17 15:12
30 kB
Rui Li
HIVE-15104.9.patch
17/Oct/17 07:41
20 kB
Rui Li
HIVE-15104.8.patch
17/Oct/17 06:59
20 kB
Rui Li
HIVE-15104.7.patch
17/Oct/17 02:28
20 kB
Rui Li
HIVE-15104.6.patch
16/Oct/17 12:54
20 kB
Rui Li
HIVE-15104.5.patch
25/Aug/17 01:24
21 kB
Rui Li
HIVE-15104.4.patch
13/Jul/17 06:42
21 kB
Rui Li
HIVE-15104.3.patch
19/May/17 12:37
23 kB
Rui Li
HIVE-15104.2.patch
15/May/17 11:29
8 kB
Rui Li
HIVE-15104.10.patch
24/Oct/17 04:14
20 kB
Rui Li
HIVE-15104.1.patch
12/May/17 08:26
8 kB
Rui Li

Issue Links

links to

Activity

People

Assignee:: Rui Li

Reporter:: wangwenli

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 01/Nov/16 16:08

Updated:: 22/May/18 23:58

Resolved:: 25/Oct/17 03:11