[SPARK-25552] Upgrade from Spark 1.6.3 to 2.3.0 seems to make jobs use about 50% more memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: Java API, Spark Core
Labels:
None
Environment:

Originally found in an AWS Kubernetes environment with Spark Embedded.

Also happens in a small scale with Spark Embedded both in Linux and MacOS.

Description

After upgrading from Spark 1.6.3 to 2.3.0 our jobs started to need about 50% more memory to run. The Spark properties used were the defaults in both versions.

For instance, before we were running a job with Spark 1.6.3 and it was running fine with 50 GB of memory.

After upgrading to Spark 2.3.0, when running the same job again with the same 50 GB of memory it failed due to out of memory.

Then, we started incrementing the memory until we were able to run the job, which was with 70 GB.

The Spark upgrade was the only change in our environment. After taking a look at what seems to be causing this we noticed that Kryo Serializer is the main culprit for the raise in memory consumption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Spark1.6-50GB.png
27/Sep/18 10:18
128 kB
Nuno Azevedo
Spark2.3-50GB.png
27/Sep/18 10:18
84 kB
Nuno Azevedo
Spark2.3-70GB.png
27/Sep/18 10:18
100 kB
Nuno Azevedo

Activity

People

Assignee:: Unassigned

Reporter:: Nuno Azevedo

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Sep/18 10:17

Updated:: 03/Mar/19 19:42

Resolved:: 03/Mar/19 19:42