Details
-
New Feature
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
Java SDK now has TextIO.read_all() API that allows reading a massive number of files by moving from using the BoundedSource API (which may perform expensive source operations on the control plane) to using ParDo operations.
This API should be added for Python SDK as well.
This form of reading files does not support dynamic work rebalancing for now. But this should not matter much when reading a massive number of relatively small files. In the future this API can support dynamic work rebalancing through Splittable DoFn.
cc: jkff
Attachments
Issue Links
- links to