Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Queries like "select count(a) from tbl" just requires checking whether the column value is not NULL. ORC files already have the PRESENT stream for each column (though it's optional). We can serve the request by just reading the PRESENT stream.
Currently, ReadIntent has two items:
enum ReadIntent { ReadIntent_ALL = 0, // Only read the offsets of selected type. Do not read the children types. ReadIntent_OFFSETS = 1 };
We can extend it to add an item like ReadIntent_PRESENT. The corresponding ColumnVectorBatch will only have valid notNull results.
This would help more on string columns. E.g. checking how many customers have email address
select count(email_address) from tpcds.customer
Attachments
Issue Links
- relates to
-
ORC-450 [C++] Support selecting list indices without materializing list items
- Closed