Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1169

When a shutdown is requested, stop scanning META regions immediately

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.19.0, 0.20.0
    • 0.19.1, 0.20.0
    • None
    • None

    Description

      During shutdown of cluster, half way through quiescing servers there is a META scan in the master. The regions from servers whose leases are already canceled show up as invalid. (72.34.249.208 is hosting META)

      2009-01-31 10:25:42,571 INFO org.apache.hadoop.hbase.master.HMaster: Cluster shutdown requested. Starting to quiesce servers
      2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease for 72.34.249.211:60020
      2009-01-31 10:25:45,868 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.211:60020: MSG_REPORT_EXITING -- lease cancelled
      2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease for 72.34.249.216:60020
      2009-01-31 10:25:47,480 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.216:60020: MSG_REPORT_EXITING -- lease cancelled
      2009-01-31 10:25:47,840 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.210:60020 quiesced
      2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease for 72.34.249.215:60020
      2009-01-31 10:25:47,944 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.215:60020: MSG_REPORT_EXITING -- lease cancelled
      2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease for 72.34.249.213:60020
      2009-01-31 10:25:48,403 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.213:60020: MSG_REPORT_EXITING -- lease cancelled
      2009-01-31 10:25:49,378 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.218:60020 quiesced
      2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Cancelling lease for 72.34.249.214:60020
      2009-01-31 10:25:50,465 INFO org.apache.hadoop.hbase.master.ServerManager: Region server 72.34.249.214:60020: MSG_REPORT_EXITING -- lease cancelled
      2009-01-31 10:25:59,531 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
      2009-01-31 10:25:59,544 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of activitydupehash,,1229364212541 is not valid;  Server '72.34.249.214:60020' unknown.
      2009-01-31 10:25:59,545 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of api,,1229364235220 is not valid;  Server '72.34.249.216:60020' unknown.
      2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of apps,,1229364222879 is not valid;  Server '72.34.249.215:60020' unknown.
      2009-01-31 10:25:59,552 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of assigners,,1229364037757 is not valid;  Server '72.34.249.214:60020' unknown.
      2009-01-31 10:25:59,554 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of canoncache,,1229364041955 is not valid;  Server '72.34.249.215:60020' unknown.
      2009-01-31 10:25:59,555 DEBUG org.apache.hadoop.hbase.master.BaseScanner: Current assignment of chunks,,1229390225893 is not valid;  Server '72.34.249.211:60020' unknown.
      

      Shutdown then continues as the last servers are quiesced, but at the same time the Master expires the lease on the regionserver that was hosting META and that it just scanned. It then starts to replay the logs for that regionserver in the middle of the shutdown.

      2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
       complete
      2009-01-31 10:25:59,799 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
      2009-01-31 10:26:59,530 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
      2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 512 row(s) of meta region {regionname: .META.,,1, startKey: <>, server: 72.34.249.218:60020}
       complete
      2009-01-31 10:26:59,720 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
      2009-01-31 10:27:40,374 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.218:60020 lease expired
      2009-01-31 10:27:40,375 DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessServerShutdown of 72.34.249.218:60020
      2009-01-31 10:27:40,375 INFO org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of server 72.34.249.218:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlin
      eMetaRegions.size(): 1
      2009-01-31 10:27:40,387 INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 44 log(s) in hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020
      2009-01-31 10:27:40,387 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
      2009-01-31 10:27:40,443 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for path hdfs://mb0:9000/hbase/.META./1028785192/oldlogfile.log and region .META.,,1
      2009-01-31 10:27:40,575 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for path hdfs://mb0:9000/hbase/sources/671225115/oldlogfile.log and region sources,,1229364117966
      2009-01-31 10:27:41,171 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100003 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1232996040603
      2009-01-31 10:27:41,173 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 2 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
      2009-01-31 10:27:41,429 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Creating new log file writer for path hdfs://mb0:9000/hbase/dupehash/1607532582/oldlogfile.log and region dupehash,O��<L;h�,12
      31779694744
      2009-01-31 10:27:41,462 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.217:60020 lease expired
      2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown
      2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop
      2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop
      2009-01-31 10:27:41,499 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified
      2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.RootScanner: RegionManager.rootScanner exiting
      2009-01-31 10:27:41,499 INFO org.apache.hadoop.hbase.master.MetaScanner: RegionManager.metaScanner exiting
      2009-01-31 10:27:41,780 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233093726382
      2009-01-31 10:27:41,781 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
      2009-01-31 10:27:41,838 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.210:60020 lease expired
      2009-01-31 10:27:41,866 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown
      2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop
      2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop
      2009-01-31 10:27:41,866 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified
      2009-01-31 10:27:42,557 INFO org.apache.hadoop.hbase.master.ServerManager: 72.34.249.212:60020 lease expired
      2009-01-31 10:27:42,581 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown
      2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop
      2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop
      2009-01-31 10:27:42,581 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified
      2009-01-31 10:27:42,615 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100002 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233101015153
      2009-01-31 10:27:42,618 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 4 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
      2009-01-31 10:27:43,356 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233111791302
      2009-01-31 10:27:43,359 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 5 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
      2009-01-31 10:27:43,404 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown
      2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop
      2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop
      2009-01-31 10:27:43,404 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified
      2009-01-31 10:27:43,991 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Applied 100001 total edits from hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233122447841
      

      During the log replay, a log file was missing from HDFS. Not sure why, there was a Datanode crash that could be related. More importantly, once it trips on the missing file it stops the replay (even though there's another 37 logs).

      2009-01-31 10:27:43,992 DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 6 of 44: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
      2009-01-31 10:27:44,022 WARN org.apache.hadoop.hbase.master.HMaster: Processing pending operations: ProcessServerShutdown of 72.34.249.218:60020
      java.io.FileNotFoundException: File does not exist: hdfs://mb0:9000/hbase/log_72.34.249.218_1232996040351_60020/hlog.dat.1233132556827
              at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
              at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
              at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
              at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
              at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:742)
              at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:705)
              at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
              at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:427)
              at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:360)
      2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop
      2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop
      2009-01-31 10:27:44,022 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified
      2009-01-31 10:27:44,023 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
      2009-01-31 10:27:44,023 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
      2009-01-31 10:27:44,023 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60010]
      2009-01-31 10:27:44,026 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:60010
      

      Attachments

        Activity

          People

            jimk Jim Kellerman
            streamy Jonathan Gray
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: