Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently the streaming CSV reader does not allow for much parallelism. It doesn't allow for reading more than one segment at once (useful in S3) and it doesn't allow for column fan-out for parsing & converting.
It seems both of these options would speed up CSV reading in some scenarios although it's possible this is mostly mitigated in cases where there are many more files than cores (as per-file parallelism will occupy all the cores anyways).
Attachments
Issue Links
- links to