[KUDU-3054] Init kudu.write_duration accumulator lazily - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.9.0
Fix Version/s: NA
Component/s: spark
Labels:
None

Description

Currently, we encountered a issue in kudu-spark that will causing spark sql query failure:

```

Job aborted due to stage failure: Total size of serialized results of 942 tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB)

```

After carefully debug, we find out that it's the kudu.write_duration accumulators causing single spark task larger than 2M, thus all tasks size of the stage will bigger than the limit.

However, this stage is just reading kudu table and do shuffle exchange, no writing any kudu tables.

So I think should init this accumulator lazily in KuduContext to avoid such issues.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

durationHisto_large.png
14/Feb/20 05:54
209 kB
liupengcheng
durationhisto.png
14/Feb/20 05:54
83 kB
liupengcheng
read_kudu_and_shuffle.png
14/Feb/20 05:54
150 kB
liupengcheng

Issue Links

duplicates

KUDU-3056 kudu-spark HdrHistogramAccumulator is too big, and make spark job failed

Resolved

links to

GitHub Pull Request #27

Activity

People

Assignee:: Unassigned

Reporter:: liupengcheng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Feb/20 02:02

Updated:: 03/Jun/20 16:06

Resolved:: 03/Jun/20 16:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h