Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Won't Fix
-
2.14.0, 2.15.0, 2.16.0
-
None
Description
When reading many files, I used to get many tasks. (beam 2.12)
When I upgrade to beam 2.14, the same code leads to different execution where all files are read by only 1 task.
This happens when not using the Source but the DoFn's (via 'withHintMatchesManyFiles')
final PCollection<GenericRecord> records = pipeline.apply(AvroIO.readGenericRecords(mySchema) .from(options.getInputPath() + "/*/*/*/data/file.avro").withHintMatchesManyFiles()); records.apply(Count.globally())