Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10841

Reduce the number of ListObjects calls when checkpointing to S3

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.5.5, 1.6.2
    • None
    • FileSystems
    • None

    Description

      With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see loads of ListObjects calls. For instance the job with ~1600 tasks requires around 23000 ListObjects calls for every checkpoint including clearing it up by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of dollars pay month just for ListObjects calls. I am aware that implementation details might be hidden in Hadoop jar and maybe difficult to change, but at least maybe some workaround might be suggested?

       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            pawelbartoszek Pawel Bartoszek
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: