Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
2.3.0
-
None
-
None
-
Originally found in an AWS Kubernetes environment with Spark Embedded.
Also happens in a small scale with Spark Embedded both in Linux and MacOS.
Description
After upgrading from Spark 1.6.3 to 2.3.0 our jobs started to need about 50% more memory to run. The Spark properties used were the defaults in both versions.
For instance, before we were running a job with Spark 1.6.3 and it was running fine with 50 GB of memory.
After upgrading to Spark 2.3.0, when running the same job again with the same 50 GB of memory it failed due to out of memory.
Then, we started incrementing the memory until we were able to run the job, which was with 70 GB.
The Spark upgrade was the only change in our environment. After taking a look at what seems to be causing this we noticed that Kryo Serializer is the main culprit for the raise in memory consumption.