Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.5.5, 1.6.2
-
None
-
None
Description
With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see loads of ListObjects calls. For instance the job with ~1600 tasks requires around 23000 ListObjects calls for every checkpoint including clearing it up by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of dollars pay month just for ListObjects calls. I am aware that implementation details might be hidden in Hadoop jar and maybe difficult to change, but at least maybe some workaround might be suggested?