Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3649

ClassCastException in GraphX custom serializers when sort-based shuffle spills

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.2.0
    • Component/s: GraphX
    • Labels:
      None

      Description

      As reported on the mailing list, GraphX throws

      java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
              at org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39) 
              at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195) 
              at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
      

      when sort-based shuffle attempts to spill to disk. This is because GraphX defines custom serializers for shuffling pair RDDs that assume Spark will always serialize the entire pair object rather than breaking it up into its components. However, the spill code path in sort-based shuffle violates this assumption.

      GraphX uses the custom serializers to compress vertex ID keys using variable-length integer encoding. However, since the serializer can no longer rely on the key and value being serialized and deserialized together, performing such encoding would require writing a tag byte. Therefore it may be better to simply remove the custom serializers.

        Attachments

          Activity

            People

            • Assignee:
              ankurd Ankur Dave
              Reporter:
              ankurd Ankur Dave
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: