[HBASE-26256] The potential delay of HDFS RPC in HRegion may cause data inconsistency and some HBase shell commands hanging - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.2
Fix Version/s: None
Component/s: regionserver
Labels:
None

Description

When a RegionServer is initializing a new region, it writes its internal metadata (e.g., WAL) in the HDFS cluster. We find that this write operation can be potentially blocked due to network issues or overloading on HDFS side, and the delay will result in inconsistency to HBase clients and cause multiple HBase APIs to hang as well.

Reproduction

Steps to reproduce the symptom from scratch:

Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default configuration.
Start a ZooKeeper cluster (3 nodes) with the default configuration.
Start a HBase cluster (1 Master + 2 RegionServers) with the default configuration.
In one of the RegionServers, introduce a delay by invoking `Thread.sleep` when it is creating its third region (alternatively, use a network packet loss injection tool like `tc`)
When the HBase cluster just gets started, the fault has not yet been triggered. We use the default HBase shell by running `bin/hbase shell` in the terminal. In the HBase shell, we repeatedly use the `create` command to create new tables, until the fault is triggered.

When the fault occurs, we observe several symptoms as follows:

The HBase shell running the `create` command hangs, without any log or warning.
If we start another HBase shell and run the `list` command to see all the tables, we can see the table in the result. However, this table has actually not been created yet. Ideally the client should not see this pending table before `create` succeeds.
If we start another HBase shell and run the `disable` command to disable this table, the HBase shell will hang, without any log or warning. Ideally, we should see some error or warning within a short duration of time, because this table has not been created yet.

The stack trace:

"RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0 tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]   java.lang.Thread.State: TIMED_WAITING (sleeping)    at java.lang.Thread.sleep(Native Method)    at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)    at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)    at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    at java.lang.Thread.run(Thread.java:748)

Relevant code snippet:

// file path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
// class: org.apache.hadoop.hbase.regionserver.HRegion

public class HRegion implements HeapSize, PropagatingConfigurationObserver, Region {
// ...
  private long initializeRegionInternals(final CancelableProgressable reporter,
      final MonitoredTask status) throws IOException {
  // ...
  if (!isRestoredRegion) {
    // ...
    if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
      // ...
      // At and only at the third time of invocation,
      // invoke Thread.sleep, to simulate a delay of HDFS RPC 
      WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(), getWALRegionDir(),
        nextSeqId - 1);
      // ...
    }
  }
  // ...
  }
// ...
}

Fix

We’re not quite sure about the root causes for the inconsistencies or the blocking of other APIs. One potential simple fix is to protect the `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside it) with timeout. We checked that throwing a timeout exception when the operation takes too long would resolve the aforementioned symptoms.

Attachments

Issue Links

is related to

HBASE-27520 The potential delay of HDFS RPC in HRegion may cause data inconsistency

Open

The potential delay of HDFS RPC in HRegion may cause data inconsistency and some HBase shell commands hanging

Details

Description

Attachments

Issue Links

Activity

People

Dates