Details
-
Bug
-
Status: Triage Needed
-
P2
-
Resolution: Fixed
-
None
-
None
Description
When Flink runner is reading from a bounded source and if the total number of files are huge and the count is more. FlinkRunner throws an OOM error. This is happening because the current implementation doesn't read them sequentially but simultaneously thus causing all of the files to be in memory which quickly breaks the cluster.
Solution : To wrap `UnboundedReadFromBoundedSource` class by a wrapper to see that when the stream is a bounded source we make it read it sequentially using a queue.
Attachments
Issue Links
- is duplicated by
-
BEAM-5650 Timeout exceptions while reading a lot of files from a bounded source like S3 with Flink runner
- Resolved
- links to