Accumulo
  1. Accumulo
  2. ACCUMULO-2140

Race conditions between client operations and upgrade

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: None
    • Labels:
      None

      Description

      While the master is upgrading, it also has a thread that is responding to client requests. Since the upgrade renames tables and puts them in namespaces, there is a short period of time where table existence checks that rely on the new zookeeper schema for tables are failing to provide the correct answer.

      Example: when the tracer starts, it tries to create a "trace" table, if it doesn't exist. The existence check returns false, so it creates a new trace table in the default namespace, even though there exists an old one that has not yet been moved into the default namespace during the upgrade. This results in two tables with the same name.

      An easy solution would be to fail to respond to client requests until after the upgrade is complete. (eg. wait to start up the MasterClientServiceHandler thread).

        Issue Links

          Activity

          Hide
          Mike Drob added a comment -

          Do we intend to support live upgrades? I'd be happy with a documentation note for the upgrade instructing people to stop all clients.

          Show
          Mike Drob added a comment - Do we intend to support live upgrades? I'd be happy with a documentation note for the upgrade instructing people to stop all clients.
          Hide
          Christopher Tubbs added a comment -

          If we explicitly say that we don't, I suppose that's fine... it reduces the footprint of this issue, but it can happen even with our own code... such as the tracer.... which is a client we explicitly start.

          Show
          Christopher Tubbs added a comment - If we explicitly say that we don't, I suppose that's fine... it reduces the footprint of this issue, but it can happen even with our own code... such as the tracer.... which is a client we explicitly start.
          Hide
          John Vines added a comment -

          I don't see how this can happen. When the master starts, in run(), it will acquire a master lock as the second line of code. That will call setMasterState, which will trigger upgradeZookeeper() which will do all of the updates to ZK before returning. It is then later in run() where the master will start up the MasterClientServiceHandler. Or has this bug already been addressed?

          Show
          John Vines added a comment - I don't see how this can happen. When the master starts, in run(), it will acquire a master lock as the second line of code. That will call setMasterState, which will trigger upgradeZookeeper() which will do all of the updates to ZK before returning. It is then later in run() where the master will start up the MasterClientServiceHandler. Or has this bug already been addressed?
          Hide
          Christopher Tubbs added a comment -

          John Vines, I think you're right.

          Show
          Christopher Tubbs added a comment - John Vines , I think you're right.
          Hide
          John Vines added a comment -

          Oh, this was a theoretical error and not something you saw live?

          Show
          John Vines added a comment - Oh, this was a theoretical error and not something you saw live?
          Hide
          Christopher Tubbs added a comment -

          I did see a new trace table get re-created on upgrade, because it wasn't in the default namespace and was therefore not visible to the client. However, I fixed the bug in the code that wasn't putting it in the correct namespace.

          It was certainly the case that it was caused by one kind of race condition: the tracer client had a different view of zookeeper while waiting on the upgrade to occur and the client thought the table didn't exist, so it sent a request to create it.

          However, I now realize that the table was not re-created during the upgrade, but after it, because the RPC (which probably waited on the master's client service being available). It's not clear to me now why the master would've allowed this request to complete, though, but I don't think it's possible anymore, as I haven't seen it since.

          Show
          Christopher Tubbs added a comment - I did see a new trace table get re-created on upgrade, because it wasn't in the default namespace and was therefore not visible to the client. However, I fixed the bug in the code that wasn't putting it in the correct namespace. It was certainly the case that it was caused by one kind of race condition: the tracer client had a different view of zookeeper while waiting on the upgrade to occur and the client thought the table didn't exist, so it sent a request to create it. However, I now realize that the table was not re-created during the upgrade, but after it, because the RPC (which probably waited on the master's client service being available). It's not clear to me now why the master would've allowed this request to complete, though, but I don't think it's possible anymore, as I haven't seen it since.

            People

            • Assignee:
              Christopher Tubbs
              Reporter:
              Christopher Tubbs
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development