Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1143

[C++] Support reading the PRESENT stream without reading the column data

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      Queries like "select count(a) from tbl" just requires checking whether the column value is not NULL. ORC files already have the PRESENT stream for each column (though it's optional). We can serve the request by just reading the PRESENT stream.

      Currently, ReadIntent has two items:

      enum ReadIntent {
        ReadIntent_ALL = 0,
      
        // Only read the offsets of selected type. Do not read the children types.
        ReadIntent_OFFSETS = 1
      };

      We can extend it to add an item like ReadIntent_PRESENT. The corresponding ColumnVectorBatch will only have valid notNull results.

      This would help more on string columns. E.g. checking how many customers have email address

      select count(email_address) from tpcds.customer 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: