Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30461

Some rocksdb sst files will remain forever

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • clean-up left-over rocksdb ssts in the shared scope introduced in FLINK-11008 (Flink version 1.8)

    Description

      In rocksdb incremental checkpoint mode, during file upload, if some files have been uploaded and some files have not been uploaded, the checkpoint is canceled due to checkpoint timeout at this time, and the uploaded files will remain.

       

      Impact: 

      The shared directory of a flink job has more than 1 million files. It exceeded the hdfs upper limit, causing new files not to be written.

      However only 50k files are available, the other 950k files should be cleaned up.

      Root cause:

      If an exception is thrown during the checkpoint async phase, flink will clean up metaStateHandle, miscFiles and sstFiles.

      However, when all sst files are uploaded, they are added together to sstFiles. If some sst files have been uploaded and some sst files are still being uploaded, and  the checkpoint is canceled due to checkpoint timeout at this time, all sst files will not be added to sstFiles. The uploaded sst will remain on hdfs.

      code link

      Solution:

      Using the CloseableRegistry as the tmpResourcesRegistry. If the async phase is failed, the tmpResourcesRegistry will cleanup these temporary resources.

       

      POC code:

      https://github.com/1996fanrui/flink/commit/86a456b2bbdad6c032bf8e0bff71c4824abb3ce1

       

       

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            fanrui Rui Fan
            fanrui Rui Fan
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment