Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Auto Closed
-
1.1.0
-
None
-
Oracle JDK 7u51 64bit on Ubuntu 12.04
Description
When using java Enum's as key in some operations the results will be very unexpected. The issue is that the Java Enum.hashCode returns the memoryposition, which is different on each JVM.
messages.filter(_.getHeader.getKind == Kind.EVENT).count >> 503650 val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT) tmp.map(_.getHeader.getKind).countByValue >> Map(EVENT -> 1389)
Because it's actually a JVM issue we either should reject with an error enums as key or implement a workaround.
A good writeup of the issue can be found here (and a workaround):
http://dev.bizo.com/2014/02/beware-enums-in-spark.html
Somewhat more on the hash codes and Enum's:
https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode
And some issues (most of them rejected) at the Oracle Bug Java database:
Attachments
Issue Links
- is related to
-
SPARK-597 HashPartitioner incorrectly partitions RDD[Array[_]]
- Resolved
- links to