Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19644

Memory leak in Spark Streaming (Encoder/Scala Reflection)

    XMLWordPrintableJSON

Details

    Description

      I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra.

      I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours.

      After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number.

      I think this is a clear case of memory leak

      Updated: The root cause is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue: https://github.com/scala/bug/issues/8302

      Attachments

        1. Dominator_tree.png
          270 kB
          Deenbandhu Agarwal
        2. heapdump.png
          115 kB
          Deenbandhu Agarwal
        3. Path2GCRoot.png
          315 kB
          Deenbandhu Agarwal

        Activity

          People

            zsxwing Shixiong Zhu
            deenbandhu Deenbandhu Agarwal
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: