HBase
  1. HBase
  2. HBASE-5494

Introduce a zk hosted table-wide read/write lock so only one table operation at a time

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.89-fb
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I saw this facility over in the accumulo code base.

      Currently we just try to sort out the mess when splits come in during an online schema edit; somehow we figure we can figure all possible region transition combinations and make the right call.

      We could try and narrow the number of combinations by taking out a zk table lock when doing table operations.

      For example, on split or merge, we could take a read-only lock meaning the table can't be disabled while these are running.

      We could then take a write only lock if we want to ensure the table doesn't change while disabling or enabling process is happening.

      Shouldn't be too hard to add.

      1. ASF.LICENSE.NOT.GRANTED--D2997.3.patch
        55 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--D2997.4.patch
        56 kB
        Phabricator
      3. ASF.LICENSE.NOT.GRANTED--D2997.5.patch
        60 kB
        Phabricator
      4. ASF.LICENSE.NOT.GRANTED--D2997.6.patch
        60 kB
        Phabricator

        Issue Links

          Activity

          Hide
          stack added a comment -

          Alex's patch committed to 0.89fb. Thanks for the patch Alex (added you as hbase contributor and assigned you this issue). We'll do up a trunk version of this patch over in HBASE-5991 where we base it on read/write zk locks.

          Show
          stack added a comment - Alex's patch committed to 0.89fb. Thanks for the patch Alex (added you as hbase contributor and assigned you this issue). We'll do up a trunk version of this patch over in HBASE-5991 where we base it on read/write zk locks.
          Hide
          Alex Feinberg added a comment -

          Yeah, you can go ahead and close out.

          Show
          Alex Feinberg added a comment - Yeah, you can go ahead and close out.
          Hide
          stack added a comment -

          @Mighty Alex. Should we close this one?

          Show
          stack added a comment - @Mighty Alex. Should we close this one?
          Hide
          Alex Feinberg added a comment -

          Created HBASE-5991 for implementation of sequential znode based read/write locks.

          Show
          Alex Feinberg added a comment - Created HBASE-5991 for implementation of sequential znode based read/write locks.
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Committed to 89-fb: http://svn.apache.org/viewvc?view=revision&sortby=log&revision=1334557

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          To: Kannan, mbautin, Liyin, JIRA, avf
          Cc: avf, tedyu, stack, khemani

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Committed to 89-fb: http://svn.apache.org/viewvc?view=revision&sortby=log&revision=1334557 REVISION DETAIL https://reviews.facebook.net/D2997 To: Kannan, mbautin, Liyin, JIRA, avf Cc: avf, tedyu, stack, khemani
          Hide
          Phabricator added a comment -

          avf has closed the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          To: Kannan, mbautin, Liyin, JIRA, avf
          Cc: avf, tedyu, stack, khemani

          Show
          Phabricator added a comment - avf has closed the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". REVISION DETAIL https://reviews.facebook.net/D2997 To: Kannan, mbautin, Liyin, JIRA, avf Cc: avf, tedyu, stack, khemani
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Discussed Prakash's comments offline: since having thread id, host name, and operation in the lock name takes advantage of RecoverableZooKeeper accidentally re-entering a lock. Having a separate map in TableLockManager for all locks held, also serves as an additional guard.

          However, per further discussions with @stack, Kannan, and Prakash we've decided that we should create another JIRA to convert the locks to use sequential ZNodes (which allow reader/writer locks to be implemented as well as avoid many of the race conditions). I will create the said JIRA.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Discussed Prakash's comments offline: since having thread id, host name, and operation in the lock name takes advantage of RecoverableZooKeeper accidentally re-entering a lock. Having a separate map in TableLockManager for all locks held, also serves as an additional guard. However, per further discussions with @stack, Kannan, and Prakash we've decided that we should create another JIRA to convert the locks to use sequential ZNodes (which allow reader/writer locks to be implemented as well as avoid many of the race conditions). I will create the said JIRA. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 What action of the lock-owner is associated with znode data change ?
          I see znode creation and deletion, not data change.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 What action of the lock-owner is associated with znode data change ? I see znode creation and deletion, not data change. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Hi Prakash.

          Thanks for the comments! I'll chat with you on Monday about the potential issue around handling connection loss (I was under impression that RecoverableZookeeper handles that).

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1336 Isn't this handled by RecoverableZooKeeper?
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:112 This is to verify that we are only releasing a lock that we ourselves acquire, and that code doesn't (accidentally) release lock acquired by other threads/processes.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 Good catch, will handle this.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Hi Prakash. Thanks for the comments! I'll chat with you on Monday about the potential issue around handling connection loss (I was under impression that RecoverableZookeeper handles that). INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1336 Isn't this handled by RecoverableZooKeeper? src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:112 This is to verify that we are only releasing a lock that we ourselves acquire, and that code doesn't (accidentally) release lock acquired by other threads/processes. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 Good catch, will handle this. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          khemani has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 Should you check the return value of this method to ensure that the node still exists?

          Say checkExistsAndCreate set the watch because the znode existed.

          Now the watch fires because the lock-owner changes the znode data.

          By the time you reset the watch here the owner releases the lock. watchAndCheckExists() will return false and you should then trigger the latch.

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1336 The zookeeper disconnect error should be specially handled - because the znode might have already been created.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:112 Will lockZNodeVersion always be 0 because this znode has just been created?

          Why should release() need lockZNodeVersion? it should be OK to blindly delete the znode?

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - khemani has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:294 Should you check the return value of this method to ensure that the node still exists? Say checkExistsAndCreate set the watch because the znode existed. Now the watch fires because the lock-owner changes the znode data. By the time you reset the watch here the owner releases the lock. watchAndCheckExists() will return false and you should then trigger the latch. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1336 The zookeeper disconnect error should be specially handled - because the znode might have already been created. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:112 Will lockZNodeVersion always be 0 because this znode has just been created? Why should release() need lockZNodeVersion? it should be OK to blindly delete the znode? REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Discussed offline with Alex. Changes to createTable (double-checked locking) look good to me.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Discussed offline with Alex. Changes to createTable (double-checked locking) look good to me. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          Kannan has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          lgtm!

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - Kannan has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". lgtm! REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          avf updated the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".
          Reviewers: Kannan, mbautin, Liyin, JIRA

          Move InjectionEventHandler.processEvent() call after lock acquisition in createTable(), explain the way "node is deleted before watch is set" race condition is handled in more detail.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java
          src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java
          src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java

          Show
          Phabricator added a comment - avf updated the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Reviewers: Kannan, mbautin, Liyin, JIRA Move InjectionEventHandler.processEvent() call after lock acquisition in createTable(), explain the way "node is deleted before watch is set" race condition is handled in more detail. REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1391 Great catch. I mentally thought I did this, but I did not. For operations that do non-trivial work before locking, I think it makes sense to have BEFORE_LOCK events as well. Will do this.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1391 Great catch. I mentally thought I did this, but I did not. For operations that do non-trivial work before locking, I think it makes sense to have BEFORE_LOCK events as well. Will do this. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          Kannan has accepted the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Looks good! One minor comment.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1391 for symmetry with other operations (like disable/enable/etc.), should we do this injection after acquiring the lock? With the latest update to the diff this order has changed.

          If we do want events before lock acquisition, we could introduce new ones in the future like HMASTER_BEFORE_CREATE_LOCK or something like that.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          BRANCH
          table_level_ddl_locks

          Show
          Phabricator added a comment - Kannan has accepted the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Looks good! One minor comment. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1391 for symmetry with other operations (like disable/enable/etc.), should we do this injection after acquiring the lock? With the latest update to the diff this order has changed. If we do want events before lock acquisition, we could introduce new ones in the future like HMASTER_BEFORE_CREATE_LOCK or something like that. REVISION DETAIL https://reviews.facebook.net/D2997 BRANCH table_level_ddl_locks
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          I think this feature should leave ample room for region level lock, to be developed in the future.
          Meaning, znode for table level lock would always exist because it has children znodes which represent region locks.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:147 Can this assumption be elaborated some more ?

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". I think this feature should leave ample room for region level lock, to be developed in the future. Meaning, znode for table level lock would always exist because it has children znodes which represent region locks. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:147 Can this assumption be elaborated some more ? REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          avf updated the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".
          Reviewers: Kannan, mbautin, Liyin, JIRA

          Stylistic changes, using putIfAbsent in DelayInducingInjectionHandler.awaitEvent(). Implemented Mikhail's suggestions regarding createTable: check that a table exists before acquiring the lock, acquire the lock, and then check if a table exists again – in order to handle common scenarios more efficiently.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java
          src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java
          src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java

          Show
          Phabricator added a comment - avf updated the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Reviewers: Kannan, mbautin, Liyin, JIRA Stylistic changes, using putIfAbsent in DelayInducingInjectionHandler.awaitEvent(). Implemented Mikhail's suggestions regarding createTable: check that a table exists before acquiring the lock, acquire the lock, and then check if a table exists again – in order to handle common scenarios more efficiently. REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Thanks @Kannan, I'll implement these suggestions right now. I'm also going to make lockTable() and unlockTable() package-local methods for now, to encourage indicate that they are (as @stack puts it) internal to the schema changing operations, rather than general purpose.

          I'm also trying out @mbautin's suggestion of 1) checking that table doesn't already exist before acquiring the lock in createTable(), 2) then checking that the table doesn't exist again after acquiring the lock.

          Will put up an updated diff.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Thanks @Kannan, I'll implement these suggestions right now. I'm also going to make lockTable() and unlockTable() package-local methods for now, to encourage indicate that they are (as @stack puts it) internal to the schema changing operations, rather than general purpose. I'm also trying out @mbautin's suggestion of 1) checking that table doesn't already exist before acquiring the lock in createTable(), 2) then checking that the table doesn't exist again after acquiring the lock. Will put up an updated diff. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          stack added a comment -

          @Alex Sounds good. I can help out getting this into trunk w/ the pb-ing and putting it up on the zk shared+locks pattern so we can do read/write locking. Good stuff.

          Show
          stack added a comment - @Alex Sounds good. I can help out getting this into trunk w/ the pb-ing and putting it up on the zk shared+locks pattern so we can do read/write locking. Good stuff.
          Hide
          Alex Feinberg added a comment -

          .bq You thought this overkill for your case? http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks That is fine. Do you think we could backfill it later underneath the patch attached here?

          I went down the non-sequential route (as you said, thinking it was over-kill and simple "create if not exist" approach would work), although I later realized that some of the potential race conditions would likely not happen if I went with their approach. I think we could backfill it later once we create read-write locks.

          I do like the idea of a new master coming up to finish previous work. If we make the ZNode data more machine parseable (e.g., convert it to protobuf in trunk) than this would be feasible to do (when a new master is brought up, the master scans the lock to see if there were any operations in progress when the previous master died).

          I agree that lock and unlock shouldn't really be public APIs (in the sense of being directly accessible to end developers) – I'll make lockTable() and unlockTable() be package-local methods then, to that end.

          Show
          Alex Feinberg added a comment - .bq You thought this overkill for your case? http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks That is fine. Do you think we could backfill it later underneath the patch attached here? I went down the non-sequential route (as you said, thinking it was over-kill and simple "create if not exist" approach would work), although I later realized that some of the potential race conditions would likely not happen if I went with their approach. I think we could backfill it later once we create read-write locks. I do like the idea of a new master coming up to finish previous work. If we make the ZNode data more machine parseable (e.g., convert it to protobuf in trunk) than this would be feasible to do (when a new master is brought up, the master scans the lock to see if there were any operations in progress when the previous master died). I agree that lock and unlock shouldn't really be public APIs (in the sense of being directly accessible to end developers) – I'll make lockTable() and unlockTable() be package-local methods then, to that end.
          Hide
          Phabricator added a comment -

          Kannan has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 logs -> locks
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:61 putIfAbsent is needed here I think. Otherwise, say the other place (line 76) runs first, and does putIfAbsent and count down on the inserted item, and this location overwrites that put, then we'll wait here for on a latch that'll never be counted down.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - Kannan has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 logs -> locks src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:61 putIfAbsent is needed here I think. Otherwise, say the other place (line 76) runs first, and does putIfAbsent and count down on the inserted item, and this location overwrites that put, then we'll wait here for on a latch that'll never be counted down. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          stack added a comment -

          I think it makes sense to have a separate JIRA (or perhaps another patch for the same JIRA?) to address this...

          Yes. Would be a separate JIRA. I'm just trying to have it so it is a continuation of the work done here rather than a different table locking facility that runs beside this one because we need read/writes and that there can be multiple read locks outstanding at anyone time; one per region currently splitting/merging.

          On region-level locks, we do not need this I'd say (Already, region changes go via zk and there'll be failed transitions if concurrent operators try manipulate a single region given we usually check znode sequence id before we go forward moving a region to a new state).

          You thought this overkill for your case? http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks That is fine. Do you think we could backfill it later underneath the patch attached here?

          On the lock being left in place on master crash, whats wrong w/ that? Thats sort of what we want I'd say. Or more, what we really want is that when the new master comes up, he finishes off what the previous master was at. To that end, should lock and unlock be public apis on the master at all, but rather just internal primitives used by the master doing disable/enable? If you disable a table, the first think you do is obtain a lock and then when done, you let it go on completion. A new master that comes up after a previous master has crashed and sees a table in disabling state should go ahead and complete the disable moving the table state to disabled and when done, then it undoes the unlock to finish the schema change transaction; you'd have to let the second master be able to undo the lock (Over in accumulo they have a system where they allow describing a macro change in terms of distinct steps; the steps are posted to zk and then acted on. We don't have to be that smart just yet... we can hardcode disable/enable for now).

          Good stuff Alex.

          Show
          stack added a comment - I think it makes sense to have a separate JIRA (or perhaps another patch for the same JIRA?) to address this... Yes. Would be a separate JIRA. I'm just trying to have it so it is a continuation of the work done here rather than a different table locking facility that runs beside this one because we need read/writes and that there can be multiple read locks outstanding at anyone time; one per region currently splitting/merging. On region-level locks, we do not need this I'd say (Already, region changes go via zk and there'll be failed transitions if concurrent operators try manipulate a single region given we usually check znode sequence id before we go forward moving a region to a new state). You thought this overkill for your case? http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks That is fine. Do you think we could backfill it later underneath the patch attached here? On the lock being left in place on master crash, whats wrong w/ that? Thats sort of what we want I'd say. Or more, what we really want is that when the new master comes up, he finishes off what the previous master was at. To that end, should lock and unlock be public apis on the master at all, but rather just internal primitives used by the master doing disable/enable? If you disable a table, the first think you do is obtain a lock and then when done, you let it go on completion. A new master that comes up after a previous master has crashed and sees a table in disabling state should go ahead and complete the disable moving the table state to disabled and when done, then it undoes the unlock to finish the schema change transaction; you'd have to let the second master be able to undo the lock (Over in accumulo they have a system where they allow describing a macro change in terms of distinct steps; the steps are posted to zk and then acted on. We don't have to be that smart just yet... we can hardcode disable/enable for now). Good stuff Alex.
          Hide
          Ted Yu added a comment -

          Introducing region-level locks requires careful consideration of various scenarios.
          e.g. consider consecutive regions R1, R2 and R3. User A wants to split R2 while user B wants to merge the three regions, at the same time.
          They both acquire table lock. Then B acquires write lock on R1 and tries to lock R2, however A has acquired write lock on R2 at this moment. B should release lock on R1 and retry later - otherwise R2 would not exist by the time A completes the splitting.

          Show
          Ted Yu added a comment - Introducing region-level locks requires careful consideration of various scenarios. e.g. consider consecutive regions R1, R2 and R3. User A wants to split R2 while user B wants to merge the three regions, at the same time. They both acquire table lock. Then B acquires write lock on R1 and tries to lock R2, however A has acquired write lock on R2 at this moment. B should release lock on R1 and retry later - otherwise R2 would not exist by the time A completes the splitting.
          Hide
          Alex Feinberg added a comment -

          Re: "One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That woul prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place."

          This is an interesting question. I think one approach may be to create a region level lock manager, and to convert the table-level lock manager to support read-write locks. Schema modifications (create/disable/alter/delete/) would acquire a table-wide read lock (as now). For splits and merges, region servers would acquire a table wide read lock (to allow two regionserves to split differnet regions of a table at the same time, but prevent schema modifications during a split/merge), and a write lock (i.e., a usual lock) over the regions that are being split (I'm not even sure if this step is even needed at this point).

          We also need a way to handle stuck locks (currently DistributedLock uses persistent ZNodes) after crashes with minimal (if any) manual intervention (key thing being that whatever schema-modification was started prior to the crash is safely rolled back – which may be non-trivial, as I would guess it would more complex than just keeping a txn id in the log and then reading through the HLog for META).

          Show
          Alex Feinberg added a comment - Re: "One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That woul prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place." This is an interesting question. I think one approach may be to create a region level lock manager, and to convert the table-level lock manager to support read-write locks. Schema modifications (create/disable/alter/delete/) would acquire a table-wide read lock (as now). For splits and merges, region servers would acquire a table wide read lock (to allow two regionserves to split differnet regions of a table at the same time, but prevent schema modifications during a split/merge), and a write lock (i.e., a usual lock) over the regions that are being split (I'm not even sure if this step is even needed at this point). We also need a way to handle stuck locks (currently DistributedLock uses persistent ZNodes) after crashes with minimal (if any) manual intervention (key thing being that whatever schema-modification was started prior to the crash is safely rolled back – which may be non-trivial, as I would guess it would more complex than just keeping a txn id in the log and then reading through the HLog for META).
          Hide
          Phabricator added a comment -

          avf updated the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".
          Reviewers: Kannan, mbautin, Liyin, JIRA

          Incorporate review feedback: fixed another potential race condition in checkExistsAndCreates, and miscellaneous other changes (Javadoc, use EnvironmentEdgeManager, changes to DelayInducingInjectionHandler, and other).

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java
          src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java
          src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java

          Show
          Phabricator added a comment - avf updated the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Reviewers: Kannan, mbautin, Liyin, JIRA Incorporate review feedback: fixed another potential race condition in checkExistsAndCreates, and miscellaneous other changes (Javadoc, use EnvironmentEdgeManager, changes to DelayInducingInjectionHandler, and other). REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          @stack I'm currently thinking over this. I think it makes sense to have a separate JIRA (or perhaps another patch for the same JIRA?) to address this – acquiring a table lock for merges and splits makes sense to me. Since a table lock would only serve to prevent table schema modifications (disabling/enabling schemas), it would be okay if the lock remains "stuck" during a split – as long as there is a way to recover from this.

          I am thinking of the right way to remove persistent locks in general from a crash – will get back to you with some thoughts on the matter.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". @stack I'm currently thinking over this. I think it makes sense to have a separate JIRA (or perhaps another patch for the same JIRA?) to address this – acquiring a table lock for merges and splits makes sense to me. Since a table lock would only serve to prevent table schema modifications (disabling/enabling schemas), it would be okay if the lock remains "stuck" during a split – as long as there is a way to recover from this. I am thinking of the right way to remove persistent locks in general from a crash – will get back to you with some thoughts on the matter. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Any comment on the below Alex?

          "One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That would prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place."

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Any comment on the below Alex? "One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That would prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place." REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          Kannan has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 Alex clarified the use of the CountDownLatch, and I am fine with that.

          My other point was that this whole thing needs to be done only if this event is present in eventToDelayTimeMs(). So it should move inside the if block at line 79. Because no point doing all this for events that this test is not interested in. Overtime, the inject handler will get called for all kinds of unrelated events as people start adding new uses of the Injection mechanism to test various parts of the code. Alex said he'll take care of that in the next update to the diff.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - Kannan has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 Alex clarified the use of the CountDownLatch, and I am fine with that. My other point was that this whole thing needs to be done only if this event is present in eventToDelayTimeMs(). So it should move inside the if block at line 79. Because no point doing all this for events that this test is not interested in. Overtime, the inject handler will get called for all kinds of unrelated events as people start adding new uses of the Injection mechanism to test various parts of the code. Alex said he'll take care of that in the next update to the diff. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Good question. One drawback to adding a CountDownLatch for each event might be:

          1) Thread A waits for CountDownLatch, for the same event. Thread B comes along and counts its down.

          2) Thread C also wants to wait for the instance of the event – but in this case, the CountDownLatch will be incremented and thread C will have to wait for the next version of the event.

          Essentially we should make a call if we allow multiple threads to wait for the same event or if we allow a thread to wait for multiple instance of the same event.

          I am actually leaning in favour of the latter now, and can change this (in our case, we only have one thread waiting for an event).
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 @mbautin I discussed this offline with Kannan, and he agreed with my use of this structure, but just asked me to use the "create CountDownLatch if one doesn't exist and then call .countDown" inside the if loop (if eventToDelayTimeMs contains the key).

          Wait/notify would work, but might be more cumbersome to implement than just using CountDownLatch.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Good question. One drawback to adding a CountDownLatch for each event might be: 1) Thread A waits for CountDownLatch, for the same event. Thread B comes along and counts its down. 2) Thread C also wants to wait for the instance of the event – but in this case, the CountDownLatch will be incremented and thread C will have to wait for the next version of the event. Essentially we should make a call if we allow multiple threads to wait for the same event or if we allow a thread to wait for multiple instance of the same event. I am actually leaning in favour of the latter now, and can change this (in our case, we only have one thread waiting for an event). src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 @mbautin I discussed this offline with Kannan, and he agreed with my use of this structure, but just asked me to use the "create CountDownLatch if one doesn't exist and then call .countDown" inside the if loop (if eventToDelayTimeMs contains the key). Wait/notify would work, but might be more cumbersome to implement than just using CountDownLatch. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Good finding.
          Maybe remove the latch after line 60 before returning ?
          src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java:2 No year is needed.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Good finding. Maybe remove the latch after line 60 before returning ? src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java:2 No year is needed. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          mbautin has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          A few more comments, but everything looks good in general. I will accept once all major comments are addressed.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 Maybe checking if the table exists before trying to get the lock might provide more useful feedback to the user. Obviously, this will need to be checked again with the lock held.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:129 Might be useful to use EnvironmentEdgeManager.currentTimeMillis() for testability.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:143 The same here
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:123 javadoc: return value (false on timeout?)
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:38 More method javadocs in this class would be nice
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 We could also set the watch before trying to create the node.
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 @Kannan: are you talking about using wait/notify on the Long object in eventToDelayTimeMs instead of using a CountDownLatch?
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Is it possible to wait for an event more than once? As it is currently implemented, once we count down the latch, all subsequent awaitEvent() calls will return instantly, because the old latch will still be in the map. How would we wait for another occurrence of the same event type?

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - mbautin has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". A few more comments, but everything looks good in general. I will accept once all major comments are addressed. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 Maybe checking if the table exists before trying to get the lock might provide more useful feedback to the user. Obviously, this will need to be checked again with the lock held. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:129 Might be useful to use EnvironmentEdgeManager.currentTimeMillis() for testability. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:143 The same here src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:123 javadoc: return value (false on timeout?) src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:38 More method javadocs in this class would be nice src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 We could also set the watch before trying to create the node. src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 @Kannan: are you talking about using wait/notify on the Long object in eventToDelayTimeMs instead of using a CountDownLatch? src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:57 Is it possible to wait for an event more than once? As it is currently implemented, once we count down the latch, all subsequent awaitEvent() calls will return instantly, because the old latch will still be in the map. How would we wait for another occurrence of the same event type? REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          @stack Thanks for the comments! If we convert ZNode contents for this to protobufs, we should also add a commands to the HBase shell (I'll be willing to help with that) to list currently held locks (who they are held by, for what purpose, and by whom) and delete them – the reason they're in plain text in the current code is to allow operations to use the ZK CLI to do so.

          I agree that using protobufs would be more appropriate for this purpose, as long as there's a way for operators to view this data without having to write code.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1315 Right now locking is not exposed in the client APIs, so essentially if locks are disabled schema changes (the only part where locks are currently used) behave as they would prior to the patch.

          Should I, perhaps, add a warning in this case? ("warning: schema modification locks are disabled, concurrent schema changes may result in corruption")
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:67 No, in this case I enforce (with an IllegalStateException) that locks can only be released by their owners. Currently, stale locks (locks after a "hard" server crash) would be removed using the ZooKeeper CLI.

          It would probably make sense to add a command to the HBase shell to list all locks currently held (and when they acquired), and then release them.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 Thanks, will do.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 I am using an IllegalStateException to enforce this as an invariant (if a lock was acquired in ZooKeeper, it should not be in acquiredTableLock – and vice versa).

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". @stack Thanks for the comments! If we convert ZNode contents for this to protobufs, we should also add a commands to the HBase shell (I'll be willing to help with that) to list currently held locks (who they are held by, for what purpose, and by whom) and delete them – the reason they're in plain text in the current code is to allow operations to use the ZK CLI to do so. I agree that using protobufs would be more appropriate for this purpose, as long as there's a way for operators to view this data without having to write code. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1315 Right now locking is not exposed in the client APIs, so essentially if locks are disabled schema changes (the only part where locks are currently used) behave as they would prior to the patch. Should I, perhaps, add a warning in this case? ("warning: schema modification locks are disabled, concurrent schema changes may result in corruption") src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:67 No, in this case I enforce (with an IllegalStateException) that locks can only be released by their owners. Currently, stale locks (locks after a "hard" server crash) would be removed using the ZooKeeper CLI. It would probably make sense to add a command to the HBase shell to list all locks currently held (and when they acquired), and then release them. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 Thanks, will do. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 I am using an IllegalStateException to enforce this as an invariant (if a lock was acquired in ZooKeeper, it should not be in acquiredTableLock – and vice versa). REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          Kannan has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          @ Alex: really nice tests!

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java:144 @Ted: This method is meant to be public. That's how different tests can "inject" their handlers.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - Kannan has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". @ Alex: really nice tests! INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java:144 @Ted: This method is meant to be public. That's how different tests can "inject" their handlers. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          stack has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          This is great stuff. I can help port to trunk (will need to add test size on new tests, annotations to classes for audience, and zk persistence needs to be protobuf'd – I can help here np).

          One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That would prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place.

          Do you see our extending this system to deal w/ the above or do you think that another system altogether?

          Good stuff A.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1315 If client asks to lock a table on cluster where this feature is not enabled, they get same response as when they successfully lock a table? Should they be different?
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1322 ditto
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 Would a tryLock make a difference here?

          When no trylock, the second thread through will fail with table already created.

          With trylock, client will get "Failed to get lock" which is pretty useless. It might have the client return faster which could be good but these events are rare so probably something we don't need to sweat.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:67 This is not exclusive, right? The lock taker does not have to be the lock undoer?
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 Add content of acquiredTableLocks to help debug?

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". This is great stuff. I can help port to trunk (will need to add test size on new tests, annotations to classes for audience, and zk persistence needs to be protobuf'd – I can help here np). One thing we'd like to prevent is a table being disabled while splits (or merges) are going on. How hard would it be to add this facility (in another jira?). One way of doing it would be that a regionserver before splitting, it'd take out the table lock. That would prevent the table from being disabled. But what about the case if two regionservers try to split a region from the same table at the one time? Or, what if the regionserver dies mid-split; the lock will be stuck in place. Do you see our extending this system to deal w/ the above or do you think that another system altogether? Good stuff A. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1315 If client asks to lock a table on cluster where this feature is not enabled, they get same response as when they successfully lock a table? Should they be different? src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1322 ditto src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 Would a tryLock make a difference here? When no trylock, the second thread through will fail with table already created. With trylock, client will get "Failed to get lock" which is pretty useless. It might have the client return faster which could be good but these events are rare so probably something we don't need to sweat. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:67 This is not exclusive, right? The lock taker does not have to be the lock undoer? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:111 Add content of acquiredTableLocks to help debug? REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 Another option is to leave the lock znode at time of lock release instead of calling zkWrapper.deleteZNode() every time.
          We can use the znode version and data for the node to distinguish who is the winner trying to acquire the lock.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 Another option is to leave the lock znode at time of lock release instead of calling zkWrapper.deleteZNode() every time. We can use the znode version and data for the node to distinguish who is the winner trying to acquire the lock. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          Kannan has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 It is not clear why we need eventsToWaitFor structure at all.

          For events that a test is not interested in, it seems odd that we are putting those in the eventsToWaitFor structure.

          eventToDelayTimeMS seems sufficient.

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 spoke with Alex offline.

          We need to handle the case where after the create failed (because the node already exists), but before we could set a watch on this, if the znode was deleted, then we should handle that case correctly. Currently, we'll return false even in that case, and the caller will wait for CountDownLatch to reach 0, but it'll never reach zero since we missed the delete event.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - Kannan has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java:75 It is not clear why we need eventsToWaitFor structure at all. For events that a test is not interested in, it seems odd that we are putting those in the eventsToWaitFor structure. eventToDelayTimeMS seems sufficient. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1321 spoke with Alex offline. We need to handle the case where after the create failed (because the node already exists), but before we could set a watch on this, if the znode was deleted, then we should handle that case correctly. Currently, we'll return false even in that case, and the caller will wait for CountDownLatch to reach 0, but it'll never reach zero since we missed the delete event. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 Is there a chance that acquiredTableLocks is out of sync with the lock status up in zk ?
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:87 Add a space after 'wait'
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:104 Do we really need this exception ?
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:111 Please include lockZNodeVersion here.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:180 fullyQualifiedZNode should be included.
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:292 Please include fullyQualifiedZNode
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1310 Please make the second part of the sentence correct
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java:144 Can this method be made non-public ?

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 Is there a chance that acquiredTableLocks is out of sync with the lock status up in zk ? src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:87 Add a space after 'wait' src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:104 Do we really need this exception ? src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:111 Please include lockZNodeVersion here. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:180 fullyQualifiedZNode should be included. src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java:292 Please include fullyQualifiedZNode src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java:1310 Please make the second part of the sentence correct src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java:144 Can this method be made non-public ? REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          avf has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          Thanks for the inline comments, @tedyu – I've replied to a few quick ones inline.

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 DistributedLock is implemented as part of the patch (see DistributedLock.java)
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Metadata for table level locks is stored as plain text – this is to allow operations to view lock information from the zookeeper CLI: toStringBinary() would not be needed here.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 In this case, an IOException is thrown up to the caller: this is to indicate a non-recoverable ZooKeeper error (DistributedLock uses RecoverableZooKeeper class under the covers). .release() may also throw an IllegalStateException – but this is essentially used an assertion in this case (releasing a lock that isn't held).

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - avf has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Thanks for the inline comments, @tedyu – I've replied to a few quick ones inline. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 DistributedLock is implemented as part of the patch (see DistributedLock.java) src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Metadata for table level locks is stored as plain text – this is to allow operations to view lock information from the zookeeper CLI: toStringBinary() would not be needed here. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 In this case, an IOException is thrown up to the caller: this is to indicate a non-recoverable ZooKeeper error (DistributedLock uses RecoverableZooKeeper class under the covers). .release() may also throw an IllegalStateException – but this is essentially used an assertion in this case (releasing a lock that isn't held). REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Phabricator added a comment -

          tedyu has commented on the revision "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".

          I only reviewed part of the patch.

          Would this feature be refined in 0.89-fb branch before being ported to Apache HBase trunk ?

          INLINE COMMENTS
          src/main/java/org/apache/hadoop/hbase/HConstants.java:98 Schema changes would always involve master.
          'master.' can be omitted.
          src/main/java/org/apache/hadoop/hbase/HConstants.java:108 Is this value big enough in cluster testing ?
          src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java:2 No year is needed.
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 This lock is used to prevent two concurrent table creation attempts.
          tryLockTable() is more desirable here.
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1310 Can we add tryLockTable() ?
          It would be useful for the non-winning thread to exit quickly.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:2 No year, please.
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Should Bytes.toStringBinary() be used here ?
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:53 Add 'be ' before 'released'
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 What if lock release fails ?
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 Can you tell me which zookeeper branch provides this lock ?
          In http://svn.apache.org/repos/asf/zookeeper/trunk, I don't seem to find this class.

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          Show
          Phabricator added a comment - tedyu has commented on the revision " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". I only reviewed part of the patch. Would this feature be refined in 0.89-fb branch before being ported to Apache HBase trunk ? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/HConstants.java:98 Schema changes would always involve master. 'master.' can be omitted. src/main/java/org/apache/hadoop/hbase/HConstants.java:108 Is this value big enough in cluster testing ? src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java:2 No year is needed. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1353 This lock is used to prevent two concurrent table creation attempts. tryLockTable() is more desirable here. src/main/java/org/apache/hadoop/hbase/master/HMaster.java:1310 Can we add tryLockTable() ? It would be useful for the non-winning thread to exit quickly. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:2 No year, please. src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:47 Should Bytes.toStringBinary() be used here ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:53 Add 'be ' before 'released' src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:137 What if lock release fails ? src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java:27 Can you tell me which zookeeper branch provides this lock ? In http://svn.apache.org/repos/asf/zookeeper/trunk , I don't seem to find this class. REVISION DETAIL https://reviews.facebook.net/D2997
          Hide
          Alex Feinberg added a comment -

          This patch implements a ZK-hosted mutual exclusion lock (DistributedLock), and table level locks (TableLockManager), and ensures that all schema changing operations are serialized. Further work would be needed to add read-write locks to handle region splitting and merges.

          Show
          Alex Feinberg added a comment - This patch implements a ZK-hosted mutual exclusion lock (DistributedLock), and table level locks (TableLockManager), and ensures that all schema changing operations are serialized. Further work would be needed to add read-write locks to handle region splitting and merges.
          Hide
          Phabricator added a comment -

          avf requested code review of "[jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.".
          Reviewers: Kannan, mbautin, Liyin, JIRA

          Since concurrent modification (e.g., disabling and dropping a table under
          creation) could leave a cluster in an inconsistent state, we need table-level
          locks for schema changing operations.

          A ZooKeeper-based distributed lock has been implemented that
          attempts to create a persistent ZNode (one ZNode per entity being locked, i.e.,
          one per table) if one does not exist. Currently in case a master crashes while
          holding the lock, the lock must be manually removed using the ZooKeeper command
          line (locks being stored in "/hbase/tableLock/").

          The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used
          to correctly handle connection loss.

          To test the locks, InjectionHandler and InjectionEvent have been introduced,
          allowing for injection of arbitrary events, in this case adding delays during
          schema changing operations as to induce a race condition.

          Future work involves automatically deleting stale lock ZNodes upon server
          recovery (providing the attempted operations are not resumed), adding metrics
          around locks (e.g., list all locks held).

          TEST PLAN
          Since concurrent modification (e.g., disabling and dropping a table
          under creation) could leave a cluster in an inconsistent state, we
          need table-level locks for schema changing operations.

          A ZooKeeper-based distributed lock has been implemented that attempts
          to create a persistent ZNode (one ZNode per entity being locked, i.e.,
          one per table) if one does not exist. Currently in case a master
          crashes while holding the lock, the lock must be manually removed
          using the ZooKeeper command line (locks being stored in
          "/hbase/tableLock/").

          The locks implemented are not fair or re-entrant. RecoverableZooKeeper
          is used to correctly handle connection loss.

          To test the locks, InjectionHandler and InjectionEvent have been
          introduced, allowing for injection of arbitrary events, in this case
          adding delays during schema changing operations as to induce a race
          condition.

          Future work involves automatically deleting stale lock ZNodes upon
          server recovery (providing the attempted operations are not resumed),
          adding metrics around locks (e.g., list all locks held).

          REVISION DETAIL
          https://reviews.facebook.net/D2997

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/HConstants.java
          src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java
          src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java
          src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java
          src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java
          src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java

          Show
          Phabricator added a comment - avf requested code review of " [jira] HBASE-5494 [89-fb] Table-level locks for schema changing operations.". Reviewers: Kannan, mbautin, Liyin, JIRA Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in "/hbase/tableLock/"). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). TEST PLAN Since concurrent modification (e.g., disabling and dropping a table under creation) could leave a cluster in an inconsistent state, we need table-level locks for schema changing operations. A ZooKeeper-based distributed lock has been implemented that attempts to create a persistent ZNode (one ZNode per entity being locked, i.e., one per table) if one does not exist. Currently in case a master crashes while holding the lock, the lock must be manually removed using the ZooKeeper command line (locks being stored in "/hbase/tableLock/"). The locks implemented are not fair or re-entrant. RecoverableZooKeeper is used to correctly handle connection loss. To test the locks, InjectionHandler and InjectionEvent have been introduced, allowing for injection of arbitrary events, in this case adding delays during schema changing operations as to induce a race condition. Future work involves automatically deleting stale lock ZNodes upon server recovery (providing the attempted operations are not resumed), adding metrics around locks (e.g., list all locks held). REVISION DETAIL https://reviews.facebook.net/D2997 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/TableLockTimeoutException.java src/main/java/org/apache/hadoop/hbase/master/HMaster.java src/main/java/org/apache/hadoop/hbase/master/TableLockManager.java src/main/java/org/apache/hadoop/hbase/util/InjectionEvent.java src/main/java/org/apache/hadoop/hbase/util/InjectionHandler.java src/main/java/org/apache/hadoop/hbase/zookeeper/DistributedLock.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java src/test/java/org/apache/hadoop/hbase/master/TestSchemaModificationLocks.java src/test/java/org/apache/hadoop/hbase/util/DelayInducingInjectionHandler.java src/test/java/org/apache/hadoop/hbase/zookeeper/TestDistributedLock.java
          Show
          stack added a comment - Instead, doing https://docs.google.com/document/d/1oqEmR35rDfp3OmGxymhWiwdea9mdKo4cZs_xSCg6yOw/edit
          Hide
          stack added a comment -

          Hmm... NVM. First I need to figure out complete set of zk states and operations they trigger. Making a start up here: https://docs.google.com/spreadsheet/ccc?key=0Aoa3E58mCyOfdFFCY3RmMW9ZeEliTndLNGRNWVBCZkE Don't look yet. It has nothing in it yet.

          Show
          stack added a comment - Hmm... NVM. First I need to figure out complete set of zk states and operations they trigger. Making a start up here: https://docs.google.com/spreadsheet/ccc?key=0Aoa3E58mCyOfdFFCY3RmMW9ZeEliTndLNGRNWVBCZkE Don't look yet. It has nothing in it yet.
          Hide
          stack added a comment -

          Thinking on this, I started to sketch a scheme where /unassigned in zk became instead /regions and under /regions there'd be a subdir per table and then under each table, you'd do something like what is described here http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks only instead of read/write we'd have a sort of exclusive-access vs non-exclusive-access.

          Already the above would be a massive refactor not to mind compounding it with the fact that doing an exclusive-operation, you'll usually want to do something like close all regions on a table (disabling) and to do that you'd need to be able to do non-exclusive operations...

          Let me start over and make a little grid of what operations we do not want to run concurrently. Will post here. Might be able to get away w/ something less disruptive and more basic.

          Show
          stack added a comment - Thinking on this, I started to sketch a scheme where /unassigned in zk became instead /regions and under /regions there'd be a subdir per table and then under each table, you'd do something like what is described here http://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks only instead of read/write we'd have a sort of exclusive-access vs non-exclusive-access. Already the above would be a massive refactor not to mind compounding it with the fact that doing an exclusive-operation, you'll usually want to do something like close all regions on a table (disabling) and to do that you'd need to be able to do non-exclusive operations... Let me start over and make a little grid of what operations we do not want to run concurrently. Will post here. Might be able to get away w/ something less disruptive and more basic.
          Hide
          stack added a comment -

          @Ram Yes sir. Thanks. Resolved hbase-5373 as duplicate of this.

          Show
          stack added a comment - @Ram Yes sir. Thanks. Resolved hbase-5373 as duplicate of this.
          Hide
          ramkrishna.s.vasudevan added a comment -
          Show
          ramkrishna.s.vasudevan added a comment - Is this similar to https://issues.apache.org/jira/browse/HBASE-5373 ?
          Hide
          stack added a comment -

          I think maybe region locks is for later. Meantime, getting a read lock on the table when doing a split or merge might carry us a long way.....before we need region specific locks.

          @Mubarak Todd has a good point that respecting locking order is important. Maybe we should use the facility where zk can add the seqid to the name? So maybe locks are named:

          <zookeeper.znode.parent>/locks/<table_name>/r-seqid

          or

          <zookeeper.znode.parent>/locks/<table_name>/w-seqid

          where the data then is details on who took the lock?

          And you can't take a write lock if any instances of read lock outstanding?

          I was thinking that maybe clients would be prepared to wait some time obtaining a read lock but that they might fail fast if they could not get a read lock?

          Show
          stack added a comment - I think maybe region locks is for later. Meantime, getting a read lock on the table when doing a split or merge might carry us a long way.....before we need region specific locks. @Mubarak Todd has a good point that respecting locking order is important. Maybe we should use the facility where zk can add the seqid to the name? So maybe locks are named: <zookeeper.znode.parent>/locks/<table_name>/r-seqid or <zookeeper.znode.parent>/locks/<table_name>/w-seqid where the data then is details on who took the lock? And you can't take a write lock if any instances of read lock outstanding? I was thinking that maybe clients would be prepared to wait some time obtaining a read lock but that they might fail fast if they could not get a read lock?
          Hide
          Todd Lipcon added a comment -

          Might be worth considering region-specific locks as well? Or is that too much scope creep? (of course we'd need to ensure a locking order to avoid deadlock)

          Show
          Todd Lipcon added a comment - Might be worth considering region-specific locks as well? Or is that too much scope creep? (of course we'd need to ensure a locking order to avoid deadlock)
          Hide
          Mubarak Seyed added a comment -

          We can store <zookeeper.znode.parent>/locks/<table_name>

          and data could be
          txId (lock held by a transaction id)
          Mode (W or R)
          State

          Show
          Mubarak Seyed added a comment - We can store <zookeeper.znode.parent>/locks/<table_name> and data could be txId (lock held by a transaction id) Mode (W or R) State

            People

            • Assignee:
              Alex Feinberg
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development