Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15001

Thread Safety issues in ReplicationSinkManager and HBaseInterClusterReplicationEndpoint

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.2.0, 1.3.0, 1.2.1, 2.0.0
    • 1.2.0, 1.3.0, 2.0.0
    • Replication
    • None

    Description

      ReplicationSinkManager is not thread-safe. This can cause problems in HBaseInterClusterReplicationEndpoint, when the walprovider is multiwal.
      For example:
      1. When multiple threads report bad sinks, the sink list can be non-empty but report a negative size because the ArrayList itself is not thread-safe.

      2. HBaseInterClusterReplicationEndpoint depends on the number of sinks to batch edits for shipping. However, it's quite possible that the following code makes it assume that there are no batches to process (sink size is non-zero, but by the time we reach the "batching" part, sink size becomes zero.)

      if (replicationSinkMgr.getSinks().size() == 0) {
          return false;
      }
      ...
      int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
                     replicationSinkMgr.getSinks().size());
      

      [Update] This leads to ArithmeticException: division by zero at:

      entryLists.get(Math.abs(Bytes.hashCode(e.getKey().getEncodedRegionName())%n)).add(e);
      

      which is benign and will just lead to retries by the ReplicationSource.

      The idea is to make all operations in ReplicationSinkManager thread-safe and do a verification on the size of replicated edits before we report success.

      Attachments

        1. Test.java
          1 kB
          Ashu Pachauri
        2. repro_stuck_replication.diff
          8 kB
          Ashu Pachauri
        3. HBASE-15001-V0.patch
          11 kB
          Ashu Pachauri

        Activity

          People

            ashu210890 Ashu Pachauri
            ashu210890 Ashu Pachauri
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: