Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18397 StoreFile accounting issues on branch-1.3 and branch-1
  3. HBASE-16788

Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.3.0
    • 1.3.0, 2.0.0
    • regionserver
    • None
    • Reviewed

    Description

      HBASE-13082 changed the way that compacted files are archived from being done inline on compaction completion to an async cleanup by the CompactedHFilesDischarger chore. It looks like the changes to HStore to support this introduced a race condition in the compacted HFile archiving.

      In the following sequence, we can wind up with two separate threads trying to archive the same HFiles, causing a regionserver abort:

      1. compaction completes normally and the compacted files are added to compactedfiles in HStore's DefaultStoreFileManager
      2. threadA: CompactedHFilesDischargeHandler runs in a RS executor service, calling closeAndArchiveCompactedFiles()
        1. obtains HStore readlock
        2. gets a copy of compactedfiles
        3. releases readlock
      3. threadB: calls HStore.close() as part of region close
        1. obtains HStore writelock
        2. calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of same compactedfiles
      4. threadA: calls HStore.removeCompactedfiles(compactedfiles)
        1. archives files in {compactedfiles}

          in HRegionFileSystem.removeStoreFiles()

        2. call HStore.clearCompactedFiles()
        3. waits on write lock
      5. threadB: continues with close()
        1. calls removeCompactedfiles(compactedfiles)
        2. calls HRegionFIleSystem.removeStoreFiles() -> HFileArchiver.archiveStoreFiles()
        3. receives FileNotFoundException because the files have already been archived by threadA
        4. throws IOException
      6. RS aborts

      I think the combination of fetching the compactedfiles list and removing the files needs to be covered by locking. Options I see are:

      • Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of readlock and move the call to removeCompactedfiles() inside the lock. This means the read operations will be blocked while the files are being archived, which is bad.
      • Synchronize closeAndArchiveCompactedFiles() and modify close() to call it instead of calling removeCompactedfiles() directly
      • Add a separate lock for compacted files removal and use in closeAndArchiveCompactedFiles() and close()

      Attachments

        1. 16788-suggest.v2
          17 kB
          Ted Yu
        2. HBASE-16788_1.patch
          12 kB
          ramkrishna.s.vasudevan
        3. HBASE-16788.001.patch
          4 kB
          Gary Helmling
        4. HBASE-16788.002.patch
          12 kB
          Gary Helmling
        5. HBASE-16788-addendum.patch
          1 kB
          Gary Helmling

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ghelmling Gary Helmling
            ghelmling Gary Helmling
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment