Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-18099

FlushSnapshotSubprocedure should wait for concurrent Region#flush() to finish

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0, 2.0.0, 1.2.7
    • Component/s: snapshots
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In the following thread:
      http://search-hadoop.com/m/HBase/YGbbMXkeHlI9zo
      Jacob described the scenario where data from certain region were missing in the snapshot.

      Here was related region server log:
      https://pastebin.com/1ECXjhRp

      He pointed out that concurrent flush from MemStoreFlusher.1 thread was not initiated from the thread pool for snapshot.

      In RegionSnapshotTask#call() method there is this:

                region.flush(true);
      

      The return value is not checked.

      In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:

                String msg = "Not flushing since "
                    + (writestate.flushing ? "already flushing"
                    : "writes not enabled");
      

      This implies that FlushSnapshotSubprocedure may incorrectly skip waiting for the concurrent flush to complete.

        Attachments

        1. 18099.v1.txt
          2 kB
          Ted Yu
        2. 18099.v2.txt
          4 kB
          Ted Yu
        3. 18099.v3.txt
          4 kB
          Ted Yu
        4. 18099.v4.txt
          4 kB
          Ted Yu

          Issue Links

            Activity

              People

              • Assignee:
                yuzhihong@gmail.com Ted Yu
                Reporter:
                yuzhihong@gmail.com Ted Yu
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: