[ORC-577] Allow row-level filtering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.7.0
Fix Version/s: 1.7.0
Component/s: None
Labels:
- releasenotes

Description

Currently, ORC filters at three levels:

File level
Stripe (64 to 256mb) level
Row group (10k row) level

The filters are specified as Sargs (Search Arguments), which have a relatively small vocabulary. Furthermore, they only filter sets of rows if they can guarantee that none of the rows can pass the filter.

There are some use cases where the user needs to read a subset of the columns and apply more detailed row level filters. I'd suggest that we add a new method in Reader.Options

setRowFilter(String[] filterColumnNames, Consumer<VectorizedRowBatch> filterCallback))

Where the columns named in columnNames are read expanded first, then the filter is run and the rest of the data is read only if the predicate returns true.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

RowFilterBenchTimestamp.out
21/Feb/20 15:46
0.9 kB
Panagiotis Garefalakis
RowFilterBenchString.out
21/Feb/20 15:46
0.8 kB
Panagiotis Garefalakis
RowFilterBenchBoolean.out
21/Feb/20 15:46
0.9 kB
Panagiotis Garefalakis
RowFilterBenchDecimal.out
21/Feb/20 15:46
2 kB
Panagiotis Garefalakis
RowFilterBenchDouble.out
21/Feb/20 15:46
0.9 kB
Panagiotis Garefalakis

Issue Links

is a parent of

ORC-597 Row-level filtering bench

Closed

ORC-619 Row-level filtering support for nested types

Open

is depended upon by

ORC-620 Modify the row filter API to use BiFunction

Open

HIVE-22731 MJ probe decode with row-level filtering

Patch Available

is duplicated by

ORC-593 Allow row-level Skipping

Closed

relates to

ORC-744 LazyIO of non-filter columns

Closed

links to

GitHub Pull Request #475

(1 relates to, 1 links to)

Activity

People

Assignee:: Panagiotis Garefalakis

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 11/Dec/19 21:08

Updated:: 27/Feb/24 22:23

Resolved:: 26/May/20 14:49

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m