Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24794

hbase.rowlock.wait.duration should not be <= 0

    XMLWordPrintableJSON

Details

    Description

      had a cluster fail after upgrade from hbase 1 because all writes to meta failed.

      master started in maintenance mode looks like (RS hosting meta in non-maint would look similar starting with HRegion.doBatchMutate):

      2020-07-28 17:52:56,553 WARN org.apache.hadoop.hbase.regionserver.HRegion: Failed getting lock, row=some_user_table
      java.io.IOException: Timed out waiting for lock for row: some_user_table in region 1588230740
              at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5863)
              at org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lockRowsAndBuildMiniBatch(HRegion.java:3322)
              at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4018)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3992)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3923)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3914)
              at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3928)
              at org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4255)
              at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3047)
              at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2827)
              at org.apache.hadoop.hbase.client.ClientServiceCallable.doMutate(ClientServiceCallable.java:55)
              at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:538)
              at org.apache.hadoop.hbase.client.HTable$3.rpcCall(HTable.java:533)
              at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
              at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
              at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
              at org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1339)
              at org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1329)
              at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1672)
              at org.apache.hadoop.hbase.MetaTableAccessor.updateTableState(MetaTableAccessor.java:1112)
              at org.apache.hadoop.hbase.master.TableStateManager.fixTableStates(TableStateManager.java:296)
              at org.apache.hadoop.hbase.master.TableStateManager.start(TableStateManager.java:269)
              at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1004)
              at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2274)
              at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
              at java.lang.Thread.run(Thread.java:745)
      

      logging roughly 6k times /second.

      failure was caused by a change in behavior for hbase.rowlock.wait.duration in HBASE-17210 (so 1.4.0+, 2.0.0+). Prior to that change setting the config <= 0 meant that row locks would succeed only if they were immediately available. After the change we fail the lock attempt without checking the lock at all.

      workaround: set hbase.rowlock.wait.duration to a small positive number, e.g. 1, if you want row locks to fail quickly.

      Attachments

        Issue Links

          Activity

            People

              busbey Sean Busbey
              busbey Sean Busbey
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: