Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2312

Possible data loss when RS goes into GC pause while rolling HLog

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.90.0
    • 0.92.0
    • master, regionserver
    • None

    Description

      There is a very corner case when bad things could happen(ie data loss):

      1) RS #1 is going to roll its HLog - not yet created the new one, old one will get no more writes
      2) RS #1 enters GC Pause of Death
      3) Master lists HLog files of RS#1 that is has to split as RS#1 is dead, starts splitting
      4) RS #1 wakes up, created the new HLog (previous one was rolled) and appends an edit - which is lost

      The following seems like a possible solution:

      1) Master detects RS#1 is dead
      2) The master renames the /hbase/.logs/<regionserver name> directory to something else (say /hbase/.logs/<regionserver name>-dead)
      3) Add mkdir support (as opposed to mkdirs) to HDFS - so that a file create fails if the directory doesn't exist. Dhruba tells me this is very doable.
      4) RS#1 comes back up and is not able create the new hlog. It restarts itself.

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--D99.1.patch
          19 kB
          Phabricator
        2. ASF.LICENSE.NOT.GRANTED--D99.2.patch
          19 kB
          Phabricator
        3. ASF.LICENSE.NOT.GRANTED--D99.3.patch
          20 kB
          Phabricator
        4. HBASE-2312.patch
          22 kB
          Nicolas Spiegelberg

        Issue Links

          Activity

            People

              nspiegelberg Nicolas Spiegelberg
              karthik.ranga Karthik Ranganathan
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: