Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28690

Aborting Active HMaster is not rejecting reportRegionStateTransition if procedure is initialised by next Active master

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      A CloseRegionProcedure on master requests the RS to close the region and after closing the region RS reports RegionStateTransition back(here). On receiving the report, the master checks if regionNode has any procedure assigned to it (code). 

       

       private boolean reportTransition(RegionStateNode regionNode, ServerStateNode serverNode,
          TransitionCode state, long seqId, long procId) throws IOException {
          ServerName serverName = serverNode.getServerName();
          TransitRegionStateProcedure proc = regionNode.getProcedure();
          if (proc == null) {
            return false;
          }
          proc.reportTransition(master.getMasterProcedureExecutor().getEnvironment(), regionNode,
            serverName, state, seqId, procId);
          return true;
        } 

      If regionNode doesn't have any procedure, the master just logs it and doesn't throw any error to RPC. 

       

      Think of a case when MasterFailover is happening and the new Active master only initialized the TRSP and CloseRegionProcedure. Now aborting Master has stale/false data. If the transition report comes to the aborting master, not rejecting this report is causing the procedure to get stuck. 

       

      Logs for more understanding 

      active master server4-1 failing

      2024-06-20 04:45:05,576 ERROR [iority.RWQ.Fifo.write.handler=3,queue=0,port=61000] master.HMaster - ***** ABORTING master server4-1,61000,1715413775736: Failed to record region server as started *****

      logs of new active master server5-1

       

      2024-06-20 04:49:28,893 DEBUG [aster/server5-1:61000:becomeActiveMaster] assignment.RegionStateStore - Load hbase:meta entry region=888a715d5926adbb89c985d8967f40d4, regionState=OPEN, lastHost=server1-119,61020,1717560166420, regionLocation=server1-119,61020,1717560166420, openSeqNum=34892620
      
      024-06-20 04:49:51,886 INFO [PEWorker-22] procedure2.ProcedureExecutor - Initialized subprocedures=[{pid=16276416, ppid=16276108, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; TransitRegionStateProcedure table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4, UNASSIGN}]  (on server5-1)
      
      2024-06-20 04:49:52,022 INFO [PEWorker-40] procedure2.ProcedureExecutor - Initialized subprocedures=[{pid=16276470, ppid=16276416, state=RUNNABLE; CloseRegionProcedure 888a715d5926adbb89c985d8967f40d4, server=server1-119,61020,1717560166420}] (on server5-1)

       

      RS logs for closing 

      2024-06-20 04:49:52,267 INFO [_REGION-regionserver/server1-119:61020-2] handler.UnassignRegionHandler - Close 888a715d5926adbb89c985d8967f40d4
      
      2024-06-20 04:49:52,267 DEBUG [_REGION-regionserver/server1-119:61020-2] regionserver.HRegion - Closing 888a715d5926adbb89c985d8967f40d4, disabling compactions & flushes
      
      2024-06-20 04:49:52,354 INFO [_REGION-regionserver/server1-119:61020-2] regionserver.HRegion - Closed TABLE,KW\x00na240-app1-16\x00/Events-120620231740\x00MARKER-Events,1702619592612.888a715d5926adbb89c985d8967f40d4.
      
      

      Logs of report on aborting active Hmaster

      2024-06-20 04:49:52,355 WARN [iority.RWQ.Fifo.write.handler=1,queue=0,port=61000] assignment.AssignmentManager - No matching procedure found for server1-119,61020,1717560166420 transition on state=OPEN, location=server1-119,61020,1717560166420, table=RIMBS.UPLOADER_JOB_DETAILS, region=888a715d5926adbb89c985d8967f40d4 to CLOSED ( host = server4-1 , hbaseMasterLogFile)

      Attachments

        Activity

          People

            umesh9414 Umesh Kumar Kumawat
            umesh9414 Umesh Kumar Kumawat
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: