Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28222

Leak in ExportSnapshot during verifySnapshot on S3A

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-beta-1
    • None
    • None
    • Hide
      ExportSnapshot now uses FileSystems from the global FileSystem cache, and as such does not close those FileSystems when it finishes. If users plan to run ExportSnapshot over and over in a single process for different FileSystem urls, they should run FileSystem.closeAll() between runs. See JIRA for details.
      Show
      ExportSnapshot now uses FileSystems from the global FileSystem cache, and as such does not close those FileSystems when it finishes. If users plan to run ExportSnapshot over and over in a single process for different FileSystem urls, they should run FileSystem.closeAll() between runs. See JIRA for details.

    Description

      Each S3AFileSystem creates an S3AInstrumentation and various metrics sources, with no real way to disable that. In HADOOP-18526, a bug was fixed so that these are not leaked. But in order to use that, you must call S3AFileSystem.close() when done.

      In ExportSnapshot, ever since HBASE-12819 we set fs.impl.disable.cache to true. It looks like that was added in order to prevent conflicting calls to close() between mapper and main thread when running in a single JVM.

      When verifySnapshot is enabled, SnapshotReferenceUtil.verifySnapshot iterates all storefiles (could be many thousands) and calls SnapshotReferenceUtil.verifyStoreFile on them. verifyStoreFile makes a number of static calls which end up in CommonFSUtils.getRootDir, which does Path.getFileSystem().

      Since the FS cache is disabled, every single call to Path.getFileSystem() creates a new FileSystem instance. That FS is short lived, and gets GC'd. But in the case of S3AFileSystem, this leaks all of the metrics stuff.

      We have two easy possible fixes:

      1. Only set fs.impl.disable.cache when running hadoop in local mode, since that was the original problem.
      2. When calling verifySnapshot, create a new Configuration which does not include the fs.impl.disable.cache setting.

      I tested out #2 in my environment and it fixed the leak.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bbeaudreault Bryan Beaudreault
            bbeaudreault Bryan Beaudreault
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment