Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16410

[C++] Scanner -> ScanNode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++
    • None

    Description

      This is going to be a parent issue for a number of changes I'd like to make to the scanner.

      • I'd like to remove the concept of the scanner from the public API
      • Related to that, ScanOptions will probably break into two classes. QueryOptions will be the public facing half while ScanOptions will be options sent to the ScanNode. Most users won't see ScanOptions as it will be internal.
      • For example, QueryOptions will have batch_readahead which represents how many "query engine batches" to readahead. Since files & the query engine have different ideas of what constitutes a "batch" the related property in ScanOptions will be rows_to_readahead.
      • Another example is projection, in QueryOptions projection is column selection as well as custom projection expressions that a user wants to run. In ScanOptions "projection" is the desired list of columns and the output type for each column, which controls casting and inference.
      • Partially related (and partially unrelated) to the above two items I would like to move the scanner away from AsyncGenerator and recast it as an execution engine node.
      • The Scanner class will become deprecated and eventually go away. Some methods like Scanner::ToTable may move into a new QueryBuilder or Query object.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              westonpace Weston Pace
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: