Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10841

Reduce the number of ListObjects calls when checkpointing to S3

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.5.5, 1.6.2
    • Fix Version/s: None
    • Component/s: FileSystems
    • Labels:
      None

      Description

      With S3 configured as checkpoint store using S3 AWS Hadoop filesystem we see loads of ListObjects calls. For instance the job with ~1600 tasks requires around 23000 ListObjects calls for every checkpoint including clearing it up by Flink. With checkpoint interval set to 5 minutes this adds up to hundreds of dollars pay month just for ListObjects calls. I am aware that implementation details might be hidden in Hadoop jar and maybe difficult to change, but at least maybe some workaround might be suggested?

       

       

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pawelbartoszek Pawel Bartoszek
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: