[SPARK-17495] Hive hash implementation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: SQL
Labels:
None

Description

Spark internally uses Murmur3Hash for partitioning. This is different from the one used by Hive. For queries which use bucketing this leads to different results if one tries the same query on both engines. For us, we want users to have backward compatibility to that one can switch parts of applications across the engines without observing regressions.

Attachments

Issue Links

is related to

SPARK-31162 Provide Configuration Parameter to select/enforce the Hive Hash for Bucketing

Open

SPARK-16904 Removal of Hive Built-in Hash Functions and TestHiveFunctionRegistry

Resolved

HIVE-18910 Migrate to Murmur hash for shuffle and bucketing

Resolved

links to

[Github] Pull Request #15047 (tejasapatil)

[Github] Pull Request #17049 (tejasapatil)

[Github] Pull Request #17056 (tejasapatil)

[Github] Pull Request #17062 (tejasapatil)

(2 links to)

Activity

People

Assignee:: Tejas Patil

Reporter:: Tejas Patil

Votes:: 2 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 10/Sep/16 19:33

Updated:: 12/Dec/22 18:10

Resolved:: 16/Mar/20 01:09