Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2494

Hash of None is different cross machines in CPython

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0, 0.9.1, 0.9.2, 1.0.0, 1.0.1
    • 0.9.3, 1.0.2, 1.1.0
    • PySpark
    • CPython 2.x

    Description

      The hash of None, also tuple with None in it, is different cross machines, so the result will be wrong if None appear in the key of partitionBy().

      It should use an portable hash function as the default partition function, which generate same hash for all the builtin immutable types, especially tuple.

      Attachments

        Activity

          People

            davies Davies Liu
            davies Davies Liu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified