[HIVE-22731] MJ probe decode with row-level filtering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Hive, llap
Labels:
- pull-request-available

Description

Currently, RecordReaders such as ORC support filtering at coarser-grained levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. They only filter sets of rows if they can guarantee that none of the rows can pass a filter (usually given as searchable argument).

However, a significant amount of time can be spend decoding rows with multiple columns that are not even used in the final result. See figure where original is what happens today and in LazyDecode we skip decoding rows that do not match the key.

To enable a more fine-grained filtering in the particular case of a MapJoin we could utilize the key HashTable created from the smaller table to skip deserializing row columns at the larger table that do not match any key and thus save CPU time.
This Jira investigates this direction.

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

decode_time_bars.pdf
15/Jan/20 16:52
13 kB
Panagiotis Garefalakis
HIVE-22731.WIP.patch
17/Jan/20 18:00
32 kB
Panagiotis Garefalakis

Issue Links

depends upon

HIVE-23215 Make FilterContext and MutableFilterContext interfaces

Closed

ORC-577 Allow row-level filtering

Closed

is blocked by

HIVE-23553 Upgrade ORC version to 1.6.7

Closed

relates to

HIVE-23167 Expression probe decode with row-level filtering

In Progress

links to

GitHub Pull Request #926

Sub-Tasks

1.

Extend storage-api to expose FilterContext

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 1.5h

2.

Basic compiler support for Probe MapJoin

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 3h 10m

3.

Compiler support tracking TS keyColName for Probe MapJoin

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 1h 20m

4.

Implement MJ HashTable contains key functionality

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 20m

5.

Compiler extensions for MJ probe optimization

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 3h 20m

6.

Compiler probe MJ selection candidate fallback

Open

Panagiotis Garefalakis

7.

[LLAP] propagate ProbeContex to LlapRecordReader

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 20m

8.

[LLAP] support ColumnVectorBatch with FilterContext as part of ReadPipeline

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 0.5h

9.

[LLAP] Probe-filter support for OrcEncodedDataConsumer

In Progress

Panagiotis Garefalakis

10.

[LLAP] Extend InputFormat to genIncludedColNames

Closed

Panagiotis Garefalakis

100%

Original Estimate - Not Specified

Original Estimate - Not Specified

Time Spent - 0.5h

11.

Support multi-key probe MapJoins

Open

Panagiotis Garefalakis

Activity

People

Assignee:: Panagiotis Garefalakis

Reporter:: Panagiotis Garefalakis

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 15/Jan/20 16:56

Updated:: 25/Nov/20 15:58

Time Tracking

Estimated:

Not Specified

Remaining:

0h

Logged:

11h 50m

Include sub-tasks