Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19644

Memory leak in Spark Streaming (Encoder/Scala Reflection)

    Details

      Description

      I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra.

      I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours.

      After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number.

      I think this is a clear case of memory leak

      Updated: The root cause is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue: https://github.com/scala/bug/issues/8302

        Attachments

        1. Dominator_tree.png
          270 kB
          Deenbandhu Agarwal
        2. heapdump.png
          115 kB
          Deenbandhu Agarwal
        3. Path2GCRoot.png
          315 kB
          Deenbandhu Agarwal

          Activity

            People

            • Assignee:
              zsxwing Shixiong Zhu
              Reporter:
              deenbandhu Deenbandhu Agarwal
            • Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: