Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20828 Finish-up AMv2 Design/List of Tenets/Specification of operation
  3. HBASE-21095

The timeout retry logic for several procedures are broken after master restarts

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.2.0
    • amv2, proc-v2
    • None
    • Reviewed

    Description

      For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or unassign a region, we will set the procedure to WAITING_TIMEOUT state, and rely on the ProcedureEvent in RegionStateNode to wake us up later. But after restarting, we do not suspend the ProcedureEvent in RSN, and also do not add the procedure to the ProcedureEvent's suspending queue, so we will hang there forever as no one will wake us up.

      Attachments

        1. HBASE-21095.branch-2.0.001.patch
          4 kB
          Allan Yang
        2. HBASE-21095-v2.patch
          12 kB
          Duo Zhang
        3. HBASE-21095-v1.patch
          13 kB
          Duo Zhang
        4. HBASE-21095.patch
          8 kB
          Duo Zhang
        5. HBASE-21095-branch-2.0.patch
          3 kB
          Duo Zhang

        Issue Links

          Activity

            People

              zhangduo Duo Zhang
              zhangduo Duo Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: