Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28422

SplitWalProcedure will attempt SplitWalRemoteProcedure on the same target RegionServer indefinitely

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 2.5.5
    • None
    • master, proc-v2, wal
    • None

    Description

      Similar to HBASE-28050. If HMaster selects a RegionServer for SplitWalRemoteProcedure, it will retry this server as long as the server is alive. I believe this is because even though RSProcedureDispatcher.ExecuteProceduresRemoteCall.run calls remoteCallFailed, there is no logic after this to select a new target server. For TransitRegionStateProcedure there is logic to select a new server for opening a region, using forceNewPlan. But SplitWalRemoteProcedure only has logic to try another server if we receive a DoNotRetryIOException in SplitWALRemoteProcedure#complete: https://github.com/apache/hbase/blob/780ff56b3f23e7041ef1b705b7d3d0a53fdd05ae/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/SplitWALRemoteProcedure.java#L104-L110

      If we receive any other IOException, we will just retry the target server forever. Just like in HBASE-28050, if there is a SaslException, this will never lead to retrying a SplitWalRemoteProcedure on a new server, which can lead to ServerCrashProcedure never finishing until the target server for SplitWalRemoteProcedure is restarted. The following log is seen repeatedly, always sending to the same host.

      2024-01-31 15:59:43,616 WARN  [RSProcedureDispatcher-pool-72846] procedure.SplitWALRemoteProcedure - Failed split of hdfs://<ns>/hbase/WALs/<host>,1704984571464-splitting/<host>1704984571464.1706710908543, retry...
      java.io.IOException: Call to address=<host> failed on local exception: java.io.IOException: Can not send request because relogin is in progress.
      	at sun.reflect.GeneratedConstructorAccessor363.newInstance(Unknown Source)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      	at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:239)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
      	at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114)
      	at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129)
      	at org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:365)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
      	at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
      	at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
      	at java.lang.Thread.run(Thread.java:750)
      Caused by: java.io.IOException: Can not send request because relogin is in progress.
      	at org.apache.hadoop.hbase.ipc.NettyRpcConnection.sendRequest0(NettyRpcConnection.java:321)
      	at org.apache.hadoop.hbase.ipc.NettyRpcConnection.lambda$sendRequest$4(NettyRpcConnection.java:363)
      	... 8 more
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dmanning David Manning
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: