Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2620

case class cannot be used as key for reduce

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
    • None
    • Spark Shell
    • reproduced on spark-shell local[4]

    Description

      Using a case class as a key doesn't seem to work properly on Spark 1.0.0

      A minimal example:

      case class P(name:String)
      val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
      sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
      [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), (P(abe),1), (P(charly),1))

      In contrast to the expected behavior, that should be equivalent to:
      sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
      Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))

      groupByKey and distinct also present the same behavior.

      Attachments

        Issue Links

          Activity

            People

              tobias.schlatter Tobias Schlatter
              gmaas Gerard Maas
              Votes:
              8 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: