Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20828 Finish-up AMv2 Design/List of Tenets/Specification of operation
  3. HBASE-21095

The timeout retry logic for several procedures are broken after master restarts

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha-1, 2.2.0
    • Component/s: amv2, proc-v2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      For TRSP, and also RTP in branch-2.0 and branch-2.1, if we fail to assign or unassign a region, we will set the procedure to WAITING_TIMEOUT state, and rely on the ProcedureEvent in RegionStateNode to wake us up later. But after restarting, we do not suspend the ProcedureEvent in RSN, and also do not add the procedure to the ProcedureEvent's suspending queue, so we will hang there forever as no one will wake us up.

        Attachments

        1. HBASE-21095-branch-2.0.patch
          3 kB
          Duo Zhang
        2. HBASE-21095.patch
          8 kB
          Duo Zhang
        3. HBASE-21095-v1.patch
          13 kB
          Duo Zhang
        4. HBASE-21095-v2.patch
          12 kB
          Duo Zhang
        5. HBASE-21095.branch-2.0.001.patch
          4 kB
          Allan Yang

          Issue Links

            Activity

              People

              • Assignee:
                zhangduo Duo Zhang
                Reporter:
                zhangduo Duo Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: