Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Incomplete
-
1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
-
None
-
reproduced on spark-shell local[4]
Description
Using a case class as a key doesn't seem to work properly on Spark 1.0.0
A minimal example:
case class P(name:String)
val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
[Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), (P(abe),1), (P(charly),1))
In contrast to the expected behavior, that should be equivalent to:
sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))
groupByKey and distinct also present the same behavior.
Attachments
Issue Links
- is duplicated by
-
SPARK-5149 Type mismatch when defining classes in Spark REPL
- Resolved
-
SPARK-9621 Closure inside RDD doesn't properly close over environment
- Resolved
- relates to
-
SPARK-7061 Case Classes Cannot be Repartitioned/Shuffled in Spark REPL
- Resolved
- links to