Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2104

RangePartitioner should use user specified serializer to serialize range bounds

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • None
    • None

    Description

      Otherwise it is pretty annoying to do a sort on types that are not java serializable.

      To reproduce, just set the serializer to Kryo, and run the following job:

      class JavaNonSerializableClass extends Comparable { override def compareTo(o: JavaNonSerializableClass) = 0 }
      
      sc.parallelize(Seq(new JavaNonSerializableClass, new JavaNonSerializableClass), 2).map(x => (x,x)).sortByKey()
      

      Basically the partitioner will always be serialized using Java (by the task closure serializer). However, the rangeBounds variable in RangePartitioner should be serialized with the user specified serializer.

      Attachments

        Activity

          People

            jerryshao Saisai Shao
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: