Description
Spark internally uses Murmur3Hash for partitioning. This is different from the one used by Hive. For queries which use bucketing this leads to different results if one tries the same query on both engines. For us, we want users to have backward compatibility to that one can switch parts of applications across the engines without observing regressions.
Attachments
Issue Links
- is related to
-
SPARK-31162 Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing
- Open
-
SPARK-16904 Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry
- Resolved
-
HIVE-18910 Migrate to Murmur hash for shuffle and bucketing
- Resolved
- links to