Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26256

The potential delay of HDFS RPC in HRegion may cause data inconsistency and some HBase shell commands hanging

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.2
    • None
    • regionserver
    • None

    Description

      When a RegionServer is initializing a new region, it writes its internal metadata (e.g., WAL) in the HDFS cluster. We find that this write operation can be potentially blocked due to network issues or overloading on HDFS side, and the delay will result in inconsistency to HBase clients and cause multiple HBase APIs to hang as well.

      Reproduction

         Steps to reproduce the symptom from scratch:

      1. Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default configuration.
      2. Start a ZooKeeper cluster (3 nodes) with the default configuration.
      3. Start a HBase cluster (1 Master + 2 RegionServers) with the default configuration.
      4. In one of the RegionServers, introduce a delay by invoking `Thread.sleep` when it is creating its third region (alternatively, use a network packet loss injection tool like `tc`)
      5. When the HBase cluster just gets started, the fault has not yet been triggered. We use the default HBase shell by running `bin/hbase shell` in the terminal. In the HBase shell, we repeatedly use the `create` command to create new tables, until the fault is triggered.

       

      When the fault occurs, we observe several symptoms as follows:

      1. The HBase shell running the `create` command hangs, without any log or warning.
      2. If we start another HBase shell and run the `list` command to see all the tables, we can see the table in the result. However, this table has actually not been created yet. Ideally the client should not see this pending table before `create` succeeds. 
      3. If we start another HBase shell and run the `disable` command to disable this table, the HBase shell will hang, without any log or warning. Ideally, we should see some error or warning within a short duration of time, because this table has not been created yet.

       

          The stack trace:

      "RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0 tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]   java.lang.Thread.State: TIMED_WAITING (sleeping)    at java.lang.Thread.sleep(Native Method)    at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)    at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)    at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    at java.lang.Thread.run(Thread.java:748)
      

       

         Relevant code snippet:

      // file path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
      // class: org.apache.hadoop.hbase.regionserver.HRegion
      
      public class HRegion implements HeapSize, PropagatingConfigurationObserver, Region {
      // ...
        private long initializeRegionInternals(final CancelableProgressable reporter,
            final MonitoredTask status) throws IOException {
        // ...
        if (!isRestoredRegion) {
          // ...
          if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
            // ...
            // At and only at the third time of invocation,
            // invoke Thread.sleep, to simulate a delay of HDFS RPC 
            WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(), getWALRegionDir(),
              nextSeqId - 1);
            // ...
          }
        }
        // ...
        }
      // ...
      }
      

      Fix

      We’re not quite sure about the root causes for the inconsistencies or the blocking of other APIs. One potential simple fix is to protect the  `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside it) with timeout. We checked that throwing a timeout exception when the operation takes too long would resolve the aforementioned symptoms.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              functioner Haoze Wu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: