Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17356

A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.3, 2.0.1, 2.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      When using MLLib, when calling toJSON on a plan with many level of sub-queries, it may cause out of memory exception with stack trace like this

      java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at scala.collection.mutable.AbstractSeq.<init>(Seq.scala:47)
      	at scala.collection.mutable.AbstractBuffer.<init>(Buffer.scala:48)
      	at scala.collection.mutable.ListBuffer.<init>(ListBuffer.scala:46)
      	at scala.collection.immutable.List$.newBuilder(List.scala:396)
      	at scala.collection.generic.GenericTraversableTemplate$class.newBuilder(GenericTraversableTemplate.scala:64)
      	at scala.collection.AbstractTraversable.newBuilder(Traversable.scala:105)
      	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:262)
      	at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
      	at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:274)
      	at scala.collection.AbstractTraversable.filterNot(Traversable.scala:105)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:25)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:20)
      	at org.json4s.jackson.JValueSerializer.serialize(JValueSerializer.scala:7)
      	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
      	at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
      	at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
      	at org.json4s.jackson.JsonMethods$class.compact(JsonMethods.scala:34)
      	at org.json4s.jackson.JsonMethods$.compact(JsonMethods.scala:50)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:566)
      

      The query plan, stack trace, and jmap distribution is attached.

        Attachments

        1. queryplan.txt
          16 kB
          Sean Zhong
        2. jmap.txt
          487 kB
          Sean Zhong
        3. jstack.txt
          121 kB
          Sean Zhong

          Activity

            People

            • Assignee:
              clockfly Sean Zhong
              Reporter:
              clockfly Sean Zhong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: