Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18099

FlushSnapshotSubprocedure should wait for concurrent Region#flush() to finish

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0, 2.0.0, 1.2.7
    • snapshots
    • None
    • Reviewed

    Description

      In the following thread:
      http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
      Jacob described the scenario where data from certain region were missing in the snapshot.

      Here was related region server log:
      https://pastebin.com/1ECXjhRp

      He pointed out that concurrent flush from MemStoreFlusher.1 thread was not initiated from the thread pool for snapshot.

      In RegionSnapshotTask#call() method there is this:

                region.flush(true);
      

      The return value is not checked.

      In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:

                String msg = "Not flushing since "
                    + (writestate.flushing ? "already flushing"
                    : "writes not enabled");
      

      This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for the concurrent flush to complete.

      Attachments

        1. 18099.v1.txt
          2 kB
          Ted Yu
        2. 18099.v2.txt
          4 kB
          Ted Yu
        3. 18099.v3.txt
          4 kB
          Ted Yu
        4. 18099.v4.txt
          4 kB
          Ted Yu

        Issue Links

          Activity

            People

              yuzhihong@gmail.com Ted Yu
              yuzhihong@gmail.com Ted Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: