Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11797

[C++][Dataset] Provide Scanner methods to yield/visit scanned batches

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      From discussion in https://issues.apache.org/jira/browse/ARROW-11782

      It'd be useful to consumers of Scanner to receive an iterator of scanned record batches or apply a visitor to batches as they are scanned without handling ScanTasks. For example, this could enable aggregations or other computations which don't require the entire table to be materialized.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              bkietz Ben Kietzman
              Reporter:
              bkietz Ben Kietzman

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m

                  Issue deployment