Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25552

Upgrade from Spark 1.6.3 to 2.3.0 seems to make jobs use about 50% more memory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.3.0
    • None
    • Java API, Spark Core
    • None
    • Originally found in an AWS Kubernetes environment with Spark Embedded.

      Also happens in a small scale with Spark Embedded both in Linux and MacOS.

    Description

      After upgrading from Spark 1.6.3 to 2.3.0 our jobs started to need about 50% more memory to run. The Spark properties used were the defaults in both versions.

       

      For instance, before we were running a job with Spark 1.6.3 and it was running fine with 50 GB of memory.

       

      After upgrading to Spark 2.3.0, when running the same job again with the same 50 GB of memory it failed due to out of memory.

       

      Then, we started incrementing the memory until we were able to run the job, which was with 70 GB.

       

      The Spark upgrade was the only change in our environment. After taking a look at what seems to be causing this we noticed that Kryo Serializer is the main culprit for the raise in memory consumption.

      Attachments

        1. Spark1.6-50GB.png
          128 kB
          Nuno Azevedo
        2. Spark2.3-50GB.png
          84 kB
          Nuno Azevedo
        3. Spark2.3-70GB.png
          100 kB
          Nuno Azevedo

        Activity

          People

            Unassigned Unassigned
            nuno.azevedo Nuno Azevedo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: