Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21987

Spark 2.3 cannot read 2.2 event logs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Reported by jincheng in a comment in SPARK-18085:

      com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "metadata" (class org.apache.spark.sql.execution.SparkPlanInfo), not marked as ignorable (4 known properties: "simpleString", "nodeName", "children", "metrics"])
       at [Source: {"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart","executionId":0,"description":"json at NativeMethodAccessorImpl.java:0","details":"org.apache.spark.sql.DataFrameWriter.json(DataFrameWriter.scala:487)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:498)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:214)\njava.lang.Thread.run(Thread.java:748)","physicalPlanDescription":"== Parsed Logical Plan ==\nRepartition 200, true\n+- LogicalRDD [uid#327L, gids#328]\n\n== Analyzed Logical Plan ==\nuid: bigint, gids: array<bigint>\nRepartition 200, true\n+- LogicalRDD [uid#327L, gids#328]\n\n== Optimized Logical Plan ==\nRepartition 200, true\n+- LogicalRDD [uid#327L, gids#328]\n\n== Physical Plan ==\nExchange RoundRobinPartitioning(200)\n+- Scan ExistingRDD[uid#327L,gids#328]","sparkPlanInfo":{"nodeName":"Exchange","simpleString":"Exchange RoundRobinPartitioning(200)","children":[{"nodeName":"ExistingRDD","simpleString":"Scan ExistingRDD[uid#327L,gids#328]","children":[],"metadata":{},"metrics":[{"name":"number of output rows","accumulatorId":140,"metricType":"sum"}]}],"metadata":{},"metrics":[{"name":"data size total (min, med, max)","accumulatorId":139,"metricType":"size"}]},"time":1504837052948}; line: 1, column: 1622] (through reference chain: org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart["sparkPlanInfo"]->org.apache.spark.sql.execution.SparkPlanInfo["children"]->com.fasterxml.jackson.module.scala.deser.BuilderWrapper[0]->org.apache.spark.sql.execution.SparkPlanInfo["metadata"])
      	at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:51)
      

      This was caused by SPARK-17701 (which at this moment is still open even though the patch has been committed).

        Issue Links

          Activity

          Hide
          smilegator Xiao Li added a comment -

          Thanks for reporting this! We need to ensure Spark 2.3 still can process 2.2 event logs and revert the changes in SparkPlanGraph

          Show
          smilegator Xiao Li added a comment - Thanks for reporting this! We need to ensure Spark 2.3 still can process 2.2 event logs and revert the changes in SparkPlanGraph
          Hide
          jincheng jincheng added a comment -

          In fact , It only occurs when using SQL

          Show
          jincheng jincheng added a comment - In fact , It only occurs when using SQL
          Hide
          cloud_fan Wenchen Fan added a comment -

          Adding back the `SparkPlanGraph.metadata` is one solution, and we can also annotate `SparkPlanGraph` with @JsonIgnoreProperties(ignoreUnknown = true). IMO keeping the metadata field is a little ugly, since `SparkPlanGraph` is a developer API, and we should be able to remove some fields if necessary.

          Show
          cloud_fan Wenchen Fan added a comment - Adding back the `SparkPlanGraph.metadata` is one solution, and we can also annotate `SparkPlanGraph` with @JsonIgnoreProperties(ignoreUnknown = true) . IMO keeping the metadata field is a little ugly, since `SparkPlanGraph` is a developer API, and we should be able to remove some fields if necessary.
          Hide
          apachespark Apache Spark added a comment -

          User 'cloud-fan' has created a pull request for this issue:
          https://github.com/apache/spark/pull/19237

          Show
          apachespark Apache Spark added a comment - User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/19237

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              vanzin Marcelo Vanzin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development