Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7815

Too subtle behavior for HConnection#getRegionLocation reload parameter and performance risk

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.95.2
    • Fix Version/s: None
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      HConnection#getRegionLocation(table, row, reload=true) and HConnection#getRegionLocation(table, row, reload=false) are not equivalent when the cache is empty: the first will check the table status while the second will not.

      As a consequence, the client won't have the same exception if the table is disabled. With reload==true, we will have a DoNotRetryIOException, with a message saying that the table is disabled. With reload==false we will have a NotServingException. It's quite difficult to guess, as it's not mentioned in the javadoc.

      Second effect is that the client is going to ZooKeeper to check this table state. In ServerCallable, if the first try is not successful, we will then go all the time to ZK to check this status. So if a region server stops, all its clients will connect to ZK, possibly multiple time if the recovery takes some time. With a few hundreds clients, it's not very nice...

      I'm not sure of the solution. A possible improvement in ServerCallable would be to do a reload only at the first retry instead of all of them, but:

      • it's not without side effects, even if it's limited
      • the real cost is the first try, as it may creates a ZK connection.

      Another thing to do would be to limit the reload to the case it makes sense. In locateRegionInMeta there is a test on the exception:(e instanceof RegionOfflineException || e instanceof NoServerForRegionException).

      May be this logic could be put in ServerCallable as well, but we need to cover all cases.

        Issue Links

          Activity

          Hide
          nkeywal Nicolas Liochon added a comment -

          Side note: doing a "get" on a disabled table will send back DoNotRetryIOException only at the second try. The first one will be using the cache in ServerCallable, so the table state won't be checked.

          Show
          nkeywal Nicolas Liochon added a comment - Side note: doing a "get" on a disabled table will send back DoNotRetryIOException only at the second try. The first one will be using the cache in ServerCallable, so the table state won't be checked.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Seems like an API fix that we should have in 0.94 as well.

          Show
          lhofhansl Lars Hofhansl added a comment - Seems like an API fix that we should have in 0.94 as well.

            People

            • Assignee:
              Unassigned
              Reporter:
              nkeywal Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development