[SPARK-2620] case class cannot be used as key for reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Incomplete
Affects Version/s: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0
Fix Version/s: None
Component/s: Spark Shell
Labels:
Environment:

reproduced on spark-shell local[4]

Description

Using a case class as a key doesn't seem to work properly on Spark 1.0.0

A minimal example:

case class P(name:String)
val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
[Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), (P(abe),1), (P(charly),1))

In contrast to the expected behavior, that should be equivalent to:
sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))

groupByKey and distinct also present the same behavior.

Attachments

Issue Links

is duplicated by

SPARK-5149 Type mismatch when defining classes in Spark REPL

Resolved

SPARK-9621 Closure inside RDD doesn't properly close over environment

Resolved

relates to

SPARK-7061 Case Classes Cannot be Repartitioned/Shuffled in Spark REPL

Resolved

links to

[Github] Pull Request #1588 (ash211)

Activity

People

Assignee:: Tobias Schlatter

Reporter:: Gerard Maas

Votes:: 8 Vote for this issue

Watchers:: 32 Start watching this issue

Dates

Created:: 22/Jul/14 16:07

Updated:: 12/Dec/22 18:11

Resolved:: 08/Oct/19 05:42