Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22057

Impose upper-bound on size of ZK ops sent in a single multi()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 1.5.0, 2.3.0
    • None
    • Reviewed
    • Hide
      Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer".
      Show
      Exposes a new configuration property "zookeeper.multi.max.size" which dictates the maximum size of deletes that HBase will make to ZooKeeper in a single RPC. This property defaults to 1MB, which should fall beneath the default ZooKeeper limit of 2MB, controlled by "jute.maxbuffer".

    Description

      In ZKUtil#multiOrSequential, we accept a list of ZKUtilOp's to pass down to the ZooKeeper#multi(Iterable<Op>) method.

      One problem with this approach is that we may generate a large list of ZNodes to mutate in one batch which exceeds the allowable client package length, specified by jute.maxbuffer.

      This problem can manifest when we have a large number of WALs to replicate, queued in ZooKeeper, from a disabled peer. When that peer is dropped, the RS would submit deletes of those queued WALs. The RS will see ConnectionLoss for the resulting multi() calls it tries to make, because we are sending too large of a client message (because we're trying to delete too many WALs at once). The result (at least in branch-1 ish versions) is that the RS aborts after exceeding the ZK retries (as this operation will never succeed).

      A simple fix would be to impose a maximum number of Ops to run in a single batch inside ZKUtil, and split apart the caller-submitted batch into smaller chunks. Before we make such a change, I do need to make sure that we don't have any expectations on atomicity of the operations. I'm not sure what ZK provides here – for the above example, splitting up batches of deletes is not an issue, but there could be issues with batches of creates where we only apply some.

      Attachments

        1. HBASE-22057.001.patch
          9 kB
          Josh Elser
        2. HBASE-22057.002.patch
          14 kB
          Josh Elser
        3. HBASE-22057.003.patch
          13 kB
          Josh Elser
        4. HBASE-22057.004.patch
          14 kB
          Josh Elser
        5. HBASE-22057-branch-1.patch
          19 kB
          Andrew Kyle Purtell

        Issue Links

          Activity

            People

              elserj Josh Elser
              elserj Josh Elser
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: