Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20842

Infinite loop when replaying remote wals

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1
    • Replication
    • None
    • Reviewed

    Description

      2018-07-03 12:25:11,375 WARN  [RSProcedureDispatcher-pool13-t19] replication.SyncReplicationReplayWALRemoteProcedure(107): Replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 failed for peer id=1
      org.apache.hadoop.hbase.regionserver.RegionServerStoppedException: Server asf916.gq1.ygridcore.net,33811,1530620636539 is not online
      	at org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher$DeadRSRemoteCall.call(RSProcedureDispatcher.java:285)
      	at org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher$DeadRSRemoteCall.call(RSProcedureDispatcher.java:276)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      2018-07-03 12:25:11,440 DEBUG [Thread-2883] replication.TestSyncReplicationStandbyKillRS(111): Server [asf916.gq1.ygridcore.net,33811,1530620636539] marked as dead, waiting for it to finish dead processing
      2018-07-03 12:25:11,441 DEBUG [Thread-2883] replication.TestSyncReplicationStandbyKillRS(114): Server [asf916.gq1.ygridcore.net,33811,1530620636539] still being processed, waiting
      2018-07-03 12:25:11,456 WARN  [RS:3;asf916:45751] wal.AbstractFSWAL(419): 'hbase.regionserver.maxlogs' was deprecated.
      2018-07-03 12:25:11,457 INFO  [RS:3;asf916:45751] wal.AbstractFSWAL(424): WAL configuration: blocksize=256 MB, rollsize=128 MB, prefix=asf916.gq1.ygridcore.net%2C45751%2C1530620709275, suffix=, logDir=hdfs://localhost:42624/user/jenkins/test-data/a86a805e-162f-5f22-7b9e-573dbf0f40fb/WALs/asf916.gq1.ygridcore.net,45751,1530620709275, archiveDir=hdfs://localhost:42624/user/jenkins/test-data/a86a805e-162f-5f22-7b9e-573dbf0f40fb/oldWALs
      2018-07-03 12:25:11,467 DEBUG [RS-EventLoopGroup-14-4] asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper(737): SASL client skipping handshake in unsecured configuration for addr = 127.0.0.1/127.0.0.1, datanodeId = DatanodeInfoWithStorage[127.0.0.1:38997,DS-6002160d-388b-4840-8538-e4c2255108be,DISK]
      2018-07-03 12:25:11,467 DEBUG [RS-EventLoopGroup-14-5] asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper(737): SASL client skipping handshake in unsecured configuration for addr = 127.0.0.1/127.0.0.1, datanodeId = DatanodeInfoWithStorage[127.0.0.1:45904,DS-e189e3c8-a1bd-475c-86c0-3891e541fc6e,DISK]
      2018-07-03 12:25:11,467 DEBUG [RS-EventLoopGroup-14-3] asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper(737): SASL client skipping handshake in unsecured configuration for addr = 127.0.0.1/127.0.0.1, datanodeId = DatanodeInfoWithStorage[127.0.0.1:39589,DS-62ced3f8-35c4-4904-80cc-4d514b8f4050,DISK]
      2018-07-03 12:25:11,495 DEBUG [RegionServerTracker-0] procedure2.ProcedureExecutor(887): Stored pid=30, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=asf916.gq1.ygridcore.net,33811,1530620636539, splitWal=true, meta=true
      2018-07-03 12:25:11,495 DEBUG [RegionServerTracker-0] assignment.AssignmentManager(1321): Added=asf916.gq1.ygridcore.net,33811,1530620636539 to dead servers, submitted shutdown handler to be executed meta=true
      2018-07-03 12:25:11,498 INFO  [PEWorker-6] procedure.ServerCrashProcedure(118): Start pid=30, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=asf916.gq1.ygridcore.net,33811,1530620636539, splitWal=true, meta=true
      2018-07-03 12:25:11,500 WARN  [RegionServerTracker-0] replication.SyncReplicationReplayWALRemoteProcedure(107): Replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 failed for peer id=1
      org.apache.hadoop.hbase.DoNotRetryIOException: server not online asf916.gq1.ygridcore.net,33811,1530620636539
      	at org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:130)
      	at org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:60)
      	at org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$BufferNode.abortOperationsInQueue(RemoteProcedureDispatcher.java:380)
      	at org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.removeNode(RemoteProcedureDispatcher.java:193)
      	at org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.serverRemoved(RSProcedureDispatcher.java:143)
      	at org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:610)
      	at org.apache.hadoop.hbase.master.RegionServerTracker.refresh(RegionServerTracker.java:160)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      2018-07-03 12:25:11,503 WARN  [PEWorker-4] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,503 WARN  [PEWorker-4] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,503 WARN  [PEWorker-4] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,503 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,504 WARN  [PEWorker-7] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,505 WARN  [PEWorker-11] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,505 WARN  [PEWorker-8] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,505 WARN  [PEWorker-8] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      2018-07-03 12:25:11,505 WARN  [PEWorker-8] replication.SyncReplicationReplayWALRemoteProcedure(162): Can not add remote operation for replay wals [remoteWALs/1-replay/asf916.gq1.ygridcore.net%2C36931%2C1530620616106-1530620683061-1.1530620683075.syncrep] on asf916.gq1.ygridcore.net,33811,1530620636539 for peer id=1, this usually because the server is already dead, retry
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zghao Guanghao Zhang Assign to me
            zhangduo Duo Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment