Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10464

Race condition during RS shutdown that could cause data loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.89-fb
    • 0.89-fb
    • regionserver
    • None

    Description

      Bug scenario (T* are timestamps, say T1 < T2 < ... < Tn):
      1. Master assigns a region to RS at T1
      2. RS works on opening the region during T1 to T3
      3. In the mean time of opening the region, RS starts to shut down at T2, and dfs client is closed at T5.
      4. Regions owned by the RS get closed as a step of RS shutdown except that the newly opened region is online during T3 to T5 and holds some mutations in memory after possible last flush T4.
      5. Since master thinks RS has a clean shutdown, there is no log splitting. The HLog was moved to old logs directory naturally.
      6. Mutations in memory between T4 to T5 (if T4 does not exist, T3 to T5) are not flushed. They only exist in WAL if it is turned on.

      Fix is to prevent region opening from succeeding when the RS is shutting down.

      Attachments

        1. D1120497.diff
          11 kB
          Yunfan Zhong

        Issue Links

          Activity

            People

              Unassigned Unassigned
              fantasist Yunfan Zhong
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: