Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5666

Blob files are not cleaned up from ZK storage directory

    Details

      Description

      When running a job with HA in an standalone cluster, the blob files are not cleaned up from the ZooKeeper storage directory.

      :zkStorageDir/blob/cache/blob_:rand

      Nico Kruber Have you seen such a behaviour while refactoring the blob server?

        Issue Links

          Activity

          Hide
          NicoK Nico Kruber added a comment -

          hmm - do you see a warning like "Failed to delete blob at..."? This is what the FileSystemBlobStore should log in case an exception is thrown during the delete() call. However, it completely ignores the result from the fileSystem.delete() call!

          Both BlobRecoveryITCase and BlobLibraryCacheRecoveryITCase do test that the delete is actually deleting the files (although this verification is a bit displaced in each of them) but only work on the local filesystem (with HA mode set). I guess if the reference counting of the BlobLibraryCacheManager is not different in HA mode, than something may indeed be wrong in the BlobServer's use of the hdfs backend only.

          Show
          NicoK Nico Kruber added a comment - hmm - do you see a warning like "Failed to delete blob at..."? This is what the FileSystemBlobStore should log in case an exception is thrown during the delete() call. However, it completely ignores the result from the fileSystem.delete() call! Both BlobRecoveryITCase and BlobLibraryCacheRecoveryITCase do test that the delete is actually deleting the files (although this verification is a bit displaced in each of them) but only work on the local filesystem (with HA mode set). I guess if the reference counting of the BlobLibraryCacheManager is not different in HA mode, than something may indeed be wrong in the BlobServer's use of the hdfs backend only .
          Hide
          NicoK Nico Kruber added a comment -

          Alternatively, the BlobLibraryCacheManager's cleanup task may simply not have run yet since the DEFAULT_LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL is 3600 seconds!

          Show
          NicoK Nico Kruber added a comment - Alternatively, the BlobLibraryCacheManager's cleanup task may simply not have run yet since the DEFAULT_LIBRARY_CACHE_MANAGER_CLEANUP_INTERVAL is 3600 seconds!
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user NicoK opened a pull request:

          https://github.com/apache/flink/pull/3222

          FLINK-5666 add unit tests verifying that BlobServer#delete() deletes from HDFS

          this does not fix FLINK-5666 but adds some more unit tests verifying intended behaviour

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/NicoK/flink flink-5666

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3222.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3222



          Show
          githubbot ASF GitHub Bot added a comment - GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/3222 FLINK-5666 add unit tests verifying that BlobServer#delete() deletes from HDFS this does not fix FLINK-5666 but adds some more unit tests verifying intended behaviour You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-5666 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3222
          Hide
          uce Ufuk Celebi added a comment -

          You are right, the library cache manager would have cleaned that up later. On regular shut down, it actually did clean it up, too.

          Show
          uce Ufuk Celebi added a comment - You are right, the library cache manager would have cleaned that up later. On regular shut down, it actually did clean it up, too.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3222

          Good additions, merging.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3222 Good additions, merging.
          Hide
          uce Ufuk Celebi added a comment -

          Added tests in 24db045 (master), b1ab75f (release-1.2).

          Show
          uce Ufuk Celebi added a comment - Added tests in 24db045 (master), b1ab75f (release-1.2).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3222

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3222

            People

            • Assignee:
              Unassigned
              Reporter:
              uce Ufuk Celebi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development