Accumulo
  1. Accumulo
  2. ACCUMULO-3036

1.5 MiniCluster fails to start, forces clients to wait indefinitely

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.5.0, 1.5.1
    • Fix Version/s: None
    • Component/s: mini
    • Labels:
      None

      Description

      Over in Pig land, a user was complaining about a test which used MiniAccumuloCluster that hung until the JUnit timeout was hit.

      Eventually, the problem was diagnosed as a bad classpath (old version of Thrift was included and used), which was causing the TServer and Master to immediately bail out. However, the client sat indefinitely trying to connect unsuccessfully.

      MAC#start should not return before we're sure that the processes are actually up and running (a very quick smoke test).

      It looks like ACCUMULO-1537 introduced a call to SetGoalState on the Master before MAC#start returned which would (I assume) fail and then throw a RTE if the Master decided to die. Including this fix in 1.5 may be sufficient to fix the underlying issue the user was seeing.

        Issue Links

          Activity

          Hide
          Christopher Tubbs added a comment -

          This issue was marked to be fixed for 1.5.3. However, no patch has been provided, and development on 1.5 is waning. Non-urgent issues are not likely to be addressed. Users are encouraged to upgrade to a newer version of Accumulo.

          Show
          Christopher Tubbs added a comment - This issue was marked to be fixed for 1.5.3. However, no patch has been provided, and development on 1.5 is waning. Non-urgent issues are not likely to be addressed. Users are encouraged to upgrade to a newer version of Accumulo.
          Hide
          Josh Elser added a comment - - edited

          One easy fix would be to watch ZooKeeper and wait for the locks for the started processes to be acquired. If they fail to do so after some period of time, we can abort.

          If we return on start() before the locks are actually held, the client is just going to be sitting there spinning its wheels trying to connect anyways. This would also be generally applicable to all versions, not just 1.5

          Show
          Josh Elser added a comment - - edited One easy fix would be to watch ZooKeeper and wait for the locks for the started processes to be acquired. If they fail to do so after some period of time, we can abort. If we return on start() before the locks are actually held, the client is just going to be sitting there spinning its wheels trying to connect anyways. This would also be generally applicable to all versions, not just 1.5
          Hide
          Josh Elser added a comment -

          Yeah, that's valid. I didn't come up with a good way to do this, so I had chucked it on my backburner.

          Show
          Josh Elser added a comment - Yeah, that's valid. I didn't come up with a good way to do this, so I had chucked it on my backburner.
          Hide
          Christopher Tubbs added a comment -

          This issue was marked as a blocker, but 1.5.2 was released without including it, so clearly, it's not a blocker.

          Show
          Christopher Tubbs added a comment - This issue was marked as a blocker, but 1.5.2 was released without including it, so clearly, it's not a blocker.
          Hide
          Josh Elser added a comment -

          SetGoalState is only modifying ZooKeeper – That's not going to help.

          The MAC impl has the root user/password. Is trying to instantiate a Connector with the root user's credentials sufficient? That will communicate with a tserver, but not the Master. That would at least be sufficient to catch the case of a bad classpath where all processes fail to run.

          Show
          Josh Elser added a comment - SetGoalState is only modifying ZooKeeper – That's not going to help. The MAC impl has the root user/password. Is trying to instantiate a Connector with the root user's credentials sufficient? That will communicate with a tserver, but not the Master. That would at least be sufficient to catch the case of a bad classpath where all processes fail to run.

            People

            • Assignee:
              Unassigned
              Reporter:
              Josh Elser
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development