[HIVE-17433] Vectorization: Support Decimal64 in Hive Query Engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: Hive
Labels:
- TODOC3.0

Description

Provide partial support for Decimal64 within Hive. By partial I mean that our current decimal has a large surface area of features (rounding, multiply, divide, remainder, power, big precision, and many more) but only a small number has been identified as being performance hotspots.

Those are small precision decimals with precision <= 18 that fit within a 64-bit long we are calling Decimal64 . Just as we optimize row-mode execution engine hotspots by selectively adding new vectorization code, we can treat the current decimal as the full featured one and add additional Decimal64 optimization where query benchmarks really show it help.

This change creates a Decimal64ColumnVector.

This change currently detects small decimal with Hive for Vectorized text input format and uses some new Decimal64 vectorized classes for comparison, addition, and later perhaps a few GroupBy aggregations like sum, avg, min, max.

The patch also supports a new annotation that can mark a VectorizedInputFormat as supporting Decimal64 (it is called DECIMAL_64). So, in separate work those other formats such as ORC, PARQUET, etc can be done in later JIRAs so they participate in the Decimal64 performance optimization.

The idea is when you annotate your input format with:

@VectorizedInputFormatSupports(supports =

{DECIMAL_64}

)

the Vectorizer in Hive will plan usage of Decimal64ColumnVector instead of DecimalColumnVector. Upon an input format seeing Decimal64ColumnVector being used, the input format can fill that column vector with decimal64 longs instead of HiveDecimalWritable objects of DecimalColumnVector.

There will be a Hive environment variable hive.vectorized.input.format.supports.enabled that has a string list of supported features. The default will start as "decimal_64". It can be turned off to allow for performance comparisons and testing.

The query SELECT * FROM DECIMAL_6_1_txt where key - 100BD < 200BD ORDER BY key, value

Will have a vectorized explain plan looking like:

...
Filter Operator
Filter Vectorization:
className: VectorFilterOperator
native: true
predicateExpression: FilterDecimal64ColLessDecimal64Scalar(col 2, val 20000000)(children: Decimal64ColSubtractDecimal64Scalar(col 0, val 10000000, outputDecimal64AbsMax 99999999999) -> 2:decimal(11,5)/DECIMAL_64) -> boolean
predicate: ((key - 100) < 200) (type: boolean)
...

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-17433.03.patch
17/Oct/17 02:26
6.14 MB
Matt McCline
HIVE-17433.04.patch
17/Oct/17 10:10
6.15 MB
Matt McCline
HIVE-17433.05.patch
24/Oct/17 11:33
10.32 MB
Matt McCline
HIVE-17433.06.patch
25/Oct/17 06:21
10.48 MB
Matt McCline
HIVE-17433.07.patch
26/Oct/17 05:11
10.54 MB
Matt McCline
HIVE-17433.08.patch
27/Oct/17 01:48
10.68 MB
Matt McCline
HIVE-17433.09.patch
27/Oct/17 15:32
11.26 MB
Matt McCline
HIVE-17433.091.patch
27/Oct/17 20:27
11.33 MB
Matt McCline
HIVE-17433.092.patch
28/Oct/17 03:35
11.33 MB
Matt McCline
HIVE-17433.093.patch
29/Oct/17 08:41
11.36 MB
Matt McCline
HIVE-17433.094.patch
29/Oct/17 17:01
11.35 MB
Matt McCline

Issue Links

causes

HIVE-22540 Vectorization: Decimal64 columns don't work with VectorizedBatchUtil.makeLikeColumnVector(ColumnVector)

Closed

relates to

HIVE-19069 Hive can't read int32 and int64 Parquet decimal values

Resolved

links to

RB#63089

Activity

People

Assignee:: Matt McCline

Reporter:: Matt McCline

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Sep/17 07:27

Updated:: 11/Nov/20 15:46

Resolved:: 29/Oct/17 20:41