Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11889

[C++] Add parallelism to streaming CSV reader

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 5.0.0
    • C++

    Description

      Currently the streaming CSV reader does not allow for much parallelism.  It doesn't allow for reading more than one segment at once (useful in S3) and it doesn't allow for column fan-out for parsing & converting.

      It seems both of these options would speed up CSV reading in some scenarios although it's possible this is mostly mitigated in cases where there are many more files than cores (as per-file parallelism will occupy all the cores anyways).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            westonpace Weston Pace
            westonpace Weston Pace
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3.5h
                3.5h

                Issue deployment