HBase
  1. HBase
  2. HBASE-433

region server should deleted restore log after successfull restore

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.1.0
    • Fix Version/s: 0.1.0, 0.2.0
    • Component/s: regionserver
    • Labels:
      None

      Description

      Currently we do not remove the restore log "oldlogfile.log" after we reopen a region after a crashed region server.

      Suggestion would be to remove after we successfully flush of all the edits to a mapfile

      so something like:
      replay log
      memcache flush
      deleted log

      1. patch.txt
        5 kB
        Jim Kellerman
      2. patch.txt
        5 kB
        Jim Kellerman

        Issue Links

          Activity

          Hide
          Billy Pearson added a comment -

          might also take a look at what point the master sees a region as crashed from a region server

          example

          say I have a failed region server and master loads and saves restore logs for each region

          on the reopen of a region and say it replaying the logs and crashes

          Would the master overwrite the old log file with a new one or would the master not consider the region as part of the region server on a crash until it receives an MSG_REPORT_PROCESS_OPEN message?
          Should look in to this so we do not have a risk of over writing a restore log file that has not successfully loaded and flushed to disk.

          Show
          Billy Pearson added a comment - might also take a look at what point the master sees a region as crashed from a region server example say I have a failed region server and master loads and saves restore logs for each region on the reopen of a region and say it replaying the logs and crashes Would the master overwrite the old log file with a new one or would the master not consider the region as part of the region server on a crash until it receives an MSG_REPORT_PROCESS_OPEN message? Should look in to this so we do not have a risk of over writing a restore log file that has not successfully loaded and flushed to disk.
          Hide
          Jim Kellerman added a comment -

          Current situation:

          If a region server crashes before it has closed any log files, then all it will leave is one zero length log file which will be ignored.

          However, if the region server crashes after closing one or more log files, and the region server was starting up a region that had an old log file (and the region server crashed before recovering the old log file), then yes, the old log file would be overwritten.

          Solution for current situation:

          HLog.splitLog should check for the presence of an existing log file, and copy its contents into a new file before processing the region server's log file(s).

          Future (when HDFS has appends):

          The region server will never leave a zero length log file unless it has received no updates since it started or since it closed the most recent log file.

          Solution for future:

          If HDFS supports appends to an existing file, then splitLog should open the region's old log file for append or create it if it does not exist.

          If HDFS does not support appends to an existing file, then the solution for the current situation would still work.

          Show
          Jim Kellerman added a comment - Current situation: If a region server crashes before it has closed any log files, then all it will leave is one zero length log file which will be ignored. However, if the region server crashes after closing one or more log files, and the region server was starting up a region that had an old log file (and the region server crashed before recovering the old log file), then yes, the old log file would be overwritten. Solution for current situation: HLog.splitLog should check for the presence of an existing log file, and copy its contents into a new file before processing the region server's log file(s). Future (when HDFS has appends): The region server will never leave a zero length log file unless it has received no updates since it started or since it closed the most recent log file. Solution for future: If HDFS supports appends to an existing file, then splitLog should open the region's old log file for append or create it if it does not exist. If HDFS does not support appends to an existing file, then the solution for the current situation would still work.
          Hide
          Jim Kellerman added a comment -

          And yes, the region server should delete the old log file once it has been completely recovered and flushed.

          Show
          Jim Kellerman added a comment - And yes, the region server should delete the old log file once it has been completely recovered and flushed.
          Hide
          Jim Kellerman added a comment -

          HLog

          • don't overwrite oldlogfile in splitLog if it already exists. Rename it and copy it into the new oldlogfile. Then delete it once it has been copied.
          • use FileUtil.fullyDelete to delete region server log directory.

          HRegion

          • delete oldlogfile once it has been successfully processed
          Show
          Jim Kellerman added a comment - HLog don't overwrite oldlogfile in splitLog if it already exists. Rename it and copy it into the new oldlogfile. Then delete it once it has been copied. use FileUtil.fullyDelete to delete region server log directory. HRegion delete oldlogfile once it has been successfully processed
          Hide
          Jim Kellerman added a comment -

          Please review.

          Show
          Jim Kellerman added a comment - Please review.
          Hide
          Billy Pearson added a comment -

          I would test but my test cluster is in route to the data center. someone could hard kill a region server and see if the logs get removed after the region recovers.

          Show
          Billy Pearson added a comment - I would test but my test cluster is in route to the data center. someone could hard kill a region server and see if the logs get removed after the region recovers.
          Hide
          Bryan Duxbury added a comment -

          Silly comment, but should the math happen after the LOG.isDebugEnabled() so that when not in debug we don't do the math on HLog:586?

          Tests pass, code looks pretty good. +1

          Show
          Bryan Duxbury added a comment - Silly comment, but should the math happen after the LOG.isDebugEnabled() so that when not in debug we don't do the math on HLog:586? Tests pass, code looks pretty good. +1
          Hide
          Jim Kellerman added a comment -

          Good point. I had just cut and pasted from below, but you're right. Why do the math if you aren't going to use the results? I'll change that before I check it in.

          Show
          Jim Kellerman added a comment - Good point. I had just cut and pasted from below, but you're right. Why do the math if you aren't going to use the results? I'll change that before I check it in.
          Hide
          Jim Kellerman added a comment -

          Same as before, but don't do the math if you aren't going to use it.

          Show
          Jim Kellerman added a comment - Same as before, but don't do the math if you aren't going to use it.
          Hide
          Jim Kellerman added a comment -

          Committed to 0.1 and trunk.

          Show
          Jim Kellerman added a comment - Committed to 0.1 and trunk.

            People

            • Assignee:
              Jim Kellerman
              Reporter:
              Billy Pearson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development