Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3847

Enum.hashCode is only consistent within the same JVM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Auto Closed
    • 1.1.0
    • None
    • Spark Core
    • Oracle JDK 7u51 64bit on Ubuntu 12.04

    Description

      When using java Enum's as key in some operations the results will be very unexpected. The issue is that the Java Enum.hashCode returns the memoryposition, which is different on each JVM.

      messages.filter(_.getHeader.getKind == Kind.EVENT).count
      >> 503650
      
      val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT)
      tmp.map(_.getHeader.getKind).countByValue
      >> Map(EVENT -> 1389)
      

      Because it's actually a JVM issue we either should reject with an error enums as key or implement a workaround.

      A good writeup of the issue can be found here (and a workaround):
      http://dev.bizo.com/2014/02/beware-enums-in-spark.html

      Somewhat more on the hash codes and Enum's:
      https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode

      And some issues (most of them rejected) at the Oracle Bug Java database:

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nathan_gs Nathan Bijnens
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: