Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 3.0.0, 2.2.0, 2.1.1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      1. Make hasLock method final, and add a locked field in Procedure to record whether we have the lock. We will set it to true in doAcquireLock and to false in doReleaseLock. The sub procedures do not need to manage it any more.

      2. Also added a locked field in the proto message. When storing, the field will be set according to the return value of hasLock. And when loading, there is a new field in Procedure called lockedWhenLoading. We will set it to true if the locked field in proto message is true.

      3. The reason why we can not set the locked field directly to true by calling doAcquireLock is that, during initialization, most procedures need to wait until master is initialized. So the solution here is that, we introduced a new method called waitInitialized in Procedure, and move the wait master initialized related code from acquireLock to this method. And we added a restoreLock method to Procedure, if lockedWhenLoading is true, we will call the acquireLock to get the lock, but do not set locked to true. And later when we call doAcquireLock and pass the waitInitialized check, we will test lockedWhenLoading, if it is true, when we just set the locked field to true and return, without actually calling the acquireLock method since we have already called it once.
      Show
      1. Make hasLock method final, and add a locked field in Procedure to record whether we have the lock. We will set it to true in doAcquireLock and to false in doReleaseLock. The sub procedures do not need to manage it any more. 2. Also added a locked field in the proto message. When storing, the field will be set according to the return value of hasLock. And when loading, there is a new field in Procedure called lockedWhenLoading. We will set it to true if the locked field in proto message is true. 3. The reason why we can not set the locked field directly to true by calling doAcquireLock is that, during initialization, most procedures need to wait until master is initialized. So the solution here is that, we introduced a new method called waitInitialized in Procedure, and move the wait master initialized related code from acquireLock to this method. And we added a restoreLock method to Procedure, if lockedWhenLoading is true, we will call the acquireLock to get the lock, but do not set locked to true. And later when we call doAcquireLock and pass the waitInitialized check, we will test lockedWhenLoading, if it is true, when we just set the locked field to true and return, without actually calling the acquireLock method since we have already called it once.

      Description

      Found this one when investigating ModifyTableProcedure got stuck while there was a MoveRegionProcedure going on after master restart.
      Though this issue can be solved by HBASE-20752. But I discovered something else.
      Before a MoveRegionProcedure can execute, it will hold the table's shared lock. so,, when a UnassignProcedure was spwaned, it will not check the table's shared lock since it is sure that its parent(MoveRegionProcedure) has aquired the table's lock.

      // If there is parent procedure, it would have already taken xlock, so no need to take
            // shared lock here. Otherwise, take shared lock.
            if (!procedure.hasParent()
                && waitTableQueueSharedLock(procedure, table) == null) {
                return true;
            }
      

      But, it is not the case when Master was restarted. The child procedure(UnassignProcedure) will be executed first after restart. Though it has a parent(MoveRegionProcedure), but apprently the parent didn't hold the table's lock.
      So, since it began to execute without hold the table's shared lock. A ModifyTableProcedure can aquire the table's exclusive lock and execute at the same time. Which is not possible if the master was not restarted.
      This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, I wrote a simple UT to repo this case.

      I think we don't have to check the parent for table's shared lock. It is a shared lock, right? I think we can acquire it every time we need it.

        Attachments

        1. HBASE-20846-v6.patch
          133 kB
          Duo Zhang
        2. HBASE-20846-v5.patch
          113 kB
          Duo Zhang
        3. HBASE-20846-v4.patch
          107 kB
          Duo Zhang
        4. HBASE-20846-v4.patch
          107 kB
          Duo Zhang
        5. HBASE-20846-v4.patch
          107 kB
          Duo Zhang
        6. HBASE-20846-v3.patch
          107 kB
          Duo Zhang
        7. HBASE-20846-v2.patch
          99 kB
          Duo Zhang
        8. HBASE-20846-v1.patch
          97 kB
          Duo Zhang
        9. HBASE-20846.patch
          50 kB
          Duo Zhang
        10. HBASE-20846.branch-2.0.002.patch
          13 kB
          Allan Yang
        11. HBASE-20846.branch-2.0.patch
          12 kB
          Allan Yang

          Issue Links

            Activity

              People

              • Assignee:
                Apache9 Duo Zhang
                Reporter:
                allan163 Allan Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: