Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10464

Race condition during RS shutdown that could cause data loss

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.89-fb
    • Fix Version/s: 0.89-fb
    • Component/s: regionserver
    • Labels:
      None

      Description

      Bug scenario (T* are timestamps, say T1 < T2 < ... < Tn):
      1. Master assigns a region to RS at T1
      2. RS works on opening the region during T1 to T3
      3. In the mean time of opening the region, RS starts to shut down at T2, and dfs client is closed at T5.
      4. Regions owned by the RS get closed as a step of RS shutdown except that the newly opened region is online during T3 to T5 and holds some mutations in memory after possible last flush T4.
      5. Since master thinks RS has a clean shutdown, there is no log splitting. The HLog was moved to old logs directory naturally.
      6. Mutations in memory between T4 to T5 (if T4 does not exist, T3 to T5) are not flushed. They only exist in WAL if it is turned on.

      Fix is to prevent region opening from succeeding when the RS is shutting down.

        Attachments

        1. D1120497.diff
          11 kB
          Yunfan Zhong

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                fantasist Yunfan Zhong
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: