[KUDU-3056] kudu-spark HdrHistogramAccumulator is too big, and make spark job failed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.12.0
Component/s: spark
Labels:
None

Target Version/s:

1.9.0

Description

in production envrinment, we use kudu-spark to read kudu table, but even we don't use the
HdrHistogramAccumulator, the HdrHistogramAccumulator stored in an array is stiil so big,
totoal of them are almost 2 MB, so that when the number of kudu-spark task(for read kudu data and shuffle) is more than 900, the spark job failed, and the follwing error occured,

Job aborted due to stage failure: Total size of serialized results of 1413 tasks (3.0 GB) is bigger than spark.driver.maxResultSize (3.0 GB)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

heap1.png
17/Feb/20 14:12
66 kB
caiconghui
heap2.png
17/Feb/20 14:12
81 kB
caiconghui
heap3.png
17/Feb/20 14:12
198 kB
caiconghui

Issue Links

is duplicated by

KUDU-3054 Init kudu.write_duration accumulator lazily

Resolved

Activity

People

Assignee:: Grant Henke

Reporter:: caiconghui

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Feb/20 14:12

Updated:: 03/Jun/20 16:06

Resolved:: 20/Feb/20 22:46

Time Tracking

Estimated:

12h

Remaining:

12h

Logged:

Not Specified