Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24794

hbase.rowlock.wait.duration should not be <= 0

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      had a cluster fail after upgrade from hbase 1 because all writes to meta failed.

      master started in maintenance mode looks like (RS hosting meta in non-maint would look similar starting with HRegion.doBatchMutate):

      2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock, row=some_user_table
      java.io.IOException: Timed out waiting for lock for row: some_user_table in region 1588230740
              at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
              at org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
              at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
              at org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
              at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
              at org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
              at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
              at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
              at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
              at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
              at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
              at org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
              at org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
              at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
              at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
              at org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
              at org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
              at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
              at java.lang.Thread.run(Thread.java:745)
      

      logging roughly 6k times /second.

      failure was caused by a change in behavior for hbase.rowlock.wait.duration in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to that change setting the config <= 0 meant that row locks would succeed only if they were immediately available. After the change we fail the lock attempt without checking the lock at all.

      workaround: set hbase.rowlock.wait.duration to a small positive number, e.g. 1, if you want row locks to fail quickly.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            busbey Sean Busbey
            busbey Sean Busbey
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment