Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.7.0
-
None
Description
Currently, ORC filters at three levels:
- File level
- Stripe (64 to 256mb) level
- Row group (10k row) level
The filters are specified as Sargs (Search Arguments), which have a relatively small vocabulary. Furthermore, they only filter sets of rows if they can guarantee that none of the rows can pass the filter.
There are some use cases where the user needs to read a subset of the columns and apply more detailed row level filters. I'd suggest that we add a new method in Reader.Options
setRowFilter(String[] filterColumnNames, Consumer<VectorizedRowBatch> filterCallback))
Where the columns named in columnNames are read expanded first, then the filter is run and the rest of the data is read only if the predicate returns true.
Attachments
Attachments
Issue Links
- is a parent of
-
ORC-597 Row-level filtering bench
- Closed
-
ORC-619 Row-level filtering support for nested types
- Open
- is depended upon by
-
ORC-620 Modify the row filter API to use BiFunction
- Open
-
HIVE-22731 MJ probe decode with row-level filtering
- Patch Available
- is duplicated by
-
ORC-593 Allow row-level Skipping
- Closed
- relates to
-
ORC-744 LazyIO of non-filter columns
- Closed
- links to