[HIVE-12348] Byte array comparison optimization for SIMD - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

The current byte comparison implementation in Hive is basic. It handles a byte (1 byte) at a time, so it's slow.

There's FastByteComparisons class in Hadoop with sun.misc.Unsafe class. (https://github.com/hanborq/hadoop/blob/master/src/core/org/apache/hadoop/io/FastByteComparisons.java) It handles a long integer (8 bytes) at a time, so it's faster.

Java 8 has String.compare, String.equalTo intrinsics with AVX2 and SSE4.2. (http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/038dd2875b94) It handles 128~256 bits (16~32 bytes) at a time, so it's much faster.

However, Unsafe.getLong and String.compare intrinsic needs additional data copies, so the actual performance increase is smaller than "1 byte : 32 bytes" comparison.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-12348.patch
12/Nov/15 16:04
19 kB
Teddy Choi

Issue Links

is duplicated by

HIVE-15741 Faster unsafe byte array comparisons

Patch Available

links to

RB Entry

Activity

People

Assignee:: Teddy Choi

Reporter:: Teddy Choi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Nov/15 20:13

Updated:: 05/Feb/17 13:43

Resolved:: 05/Feb/17 13:43