Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4695

WAL logs get deleted before region server can fully flush

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.90.4
    • 0.90.5
    • wal
    • None
    • Reviewed

    Description

      To replicate the problem do the following:

      1. check /hbase/.logs/XXXX directory to see if you have WAL logs for the region server you are shutting down.
      2. executing kill <pid> (where pid is a regionserver pid)
      3. Watch the regionserver log to start flushing, you will see how many regions are left to flush:

      09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 489 regions to close
      09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting on 116 regions to close

      4. Check /hbase/.logs/XXXX – you will notice that it has dissapeared.
      5. Check namenode logs:

      09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root ip=/10.101.1.5 cmd=delete src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749

      Note that, if you kill -9 the RS now, and it crashes on flush, you won't have any WAL logs to replay. We need to make sure that logs are deleted or moved out only when RS has fully flushed. Otherwise its possible to lose data.

      Attachments

        1. HBASE-4695_branch90_trial.patch
          1 kB
          gaojinchao
        2. HBASE-4695_Branch90_V2.patch
          1 kB
          gaojinchao
        3. HBASE-4695_Trunk_V2.patch
          1 kB
          gaojinchao
        4. hbase-4695-0.92.txt
          1 kB
          Ted Yu

        Activity

          People

            sunnygao gaojinchao
            jacque74 jack levin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: