Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2494

Hash of None is different cross machines in CPython

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.1, 0.9.2, 1.0.0, 1.0.1
    • Fix Version/s: 0.9.3, 1.0.2, 1.1.0
    • Component/s: PySpark
    • Labels:
    • Environment:

      CPython 2.x

      Description

      The hash of None, also tuple with None in it, is different cross machines, so the result will be wrong if None appear in the key of partitionBy().

      It should use an portable hash function as the default partition function, which generate same hash for all the builtin immutable types, especially tuple.

        Attachments

          Activity

            People

            • Assignee:
              davies Davies Liu
              Reporter:
              davies Davies Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified