Accumulo
  1. Accumulo
  2. ACCUMULO-2985

MAC doesn't stop cleanly in 1.6.1-SNAPSHOT

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.2, 1.6.1, 1.7.0
    • Component/s: mini
    • Labels:
      None

      Description

      Using the following code to do some work-

      public class TestMACWithRealInstance {
        public static void main(String args[]) throws IOException, AccumuloException, AccumuloSecurityException, TableExistsException, InterruptedException {
          MiniAccumuloConfig macConfig = new MiniAccumuloConfig(new File("/tmp/mac"), "secret");
          macConfig.setNumTservers(2);
          MiniAccumuloCluster mac = new MiniAccumuloCluster(macConfig);
          mac.start();
          mac.getConnector("root", "secret").tableOperations().create("macCreated");
          mac.stop();
        }
      }
      

      It works fine against 1.6.0, but it seems broken against 01da4f4a8b14a125d3a2e29ef98dd044ab9ec75f after calling stop() it just sits in the terminal spewing messages about unable to connect to zookeeper

        Issue Links

          Activity

          Sean Busbey made changes -
          Link This issue breaks ACCUMULO-3055 [ ACCUMULO-3055 ]
          Josh Elser made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Josh Elser added a comment -

          Make sure that the ES is not shutdown when a cluster is started, so that it can be used to stop the processes again.

          Show
          Josh Elser added a comment - Make sure that the ES is not shutdown when a cluster is started, so that it can be used to stop the processes again.
          ASF subversion and git services made changes -
          Time Spent 50m [ 3000 ] 1h [ 3600 ]
          Worklog Id 16659 [ 16659 ]
          ASF subversion and git services logged work - 15/Jul/14 23:36
          • Time Spent:
            10m
             
            Commit e2b2676f1be20474f1e4d2af99b3747de0c7d9c1 in accumulo's branch refs/heads/master from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e2b2676 ]

            ACCUMULO-2985 Ensure that ES is properly initialized during start()

            By initializing the ExecutorService in the constructor, I broke the ability
            to stop, start, and then re-stop the minicluster. Lazily initialize the ES
            in start() which should alleviate the issue.
          ASF subversion and git services made changes -
          Time Spent 40m [ 2400 ] 50m [ 3000 ]
          Worklog Id 16658 [ 16658 ]
          ASF subversion and git services logged work - 15/Jul/14 23:36
          • Time Spent:
            10m
             
            Commit e2b2676f1be20474f1e4d2af99b3747de0c7d9c1 in accumulo's branch refs/heads/1.6.1-SNAPSHOT from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e2b2676 ]

            ACCUMULO-2985 Ensure that ES is properly initialized during start()

            By initializing the ExecutorService in the constructor, I broke the ability
            to stop, start, and then re-stop the minicluster. Lazily initialize the ES
            in start() which should alleviate the issue.
          ASF subversion and git services made changes -
          Time Spent 0.5h [ 1800 ] 40m [ 2400 ]
          Worklog Id 16657 [ 16657 ]
          ASF subversion and git services logged work - 15/Jul/14 23:36
          • Time Spent:
            10m
             
            Commit e2b2676f1be20474f1e4d2af99b3747de0c7d9c1 in accumulo's branch refs/heads/1.5.2-SNAPSHOT from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e2b2676 ]

            ACCUMULO-2985 Ensure that ES is properly initialized during start()

            By initializing the ExecutorService in the constructor, I broke the ability
            to stop, start, and then re-stop the minicluster. Lazily initialize the ES
            in start() which should alleviate the issue.
          Hide
          Josh Elser added a comment -

          It looked like this was fine up until it ran the VolumeIT which I had at least 4 accumulo clusters still running. I think the issue is that if the MAC is restarted after being stopped, the ES would still be stopped.

          Show
          Josh Elser added a comment - It looked like this was fine up until it ran the VolumeIT which I had at least 4 accumulo clusters still running. I think the issue is that if the MAC is restarted after being stopped, the ES would still be stopped.
          Josh Elser made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Josh Elser added a comment -

          I'm reopening this because, after committing these changes, my jenkins instance has been repeatedly dying from (I think) running out of memory. I'm seeing loads of Accumuo and ZooKeeper processes hanging around. I'm guessing that perhaps the shutdown logic on the ES isn't right.

          Show
          Josh Elser added a comment - I'm reopening this because, after committing these changes, my jenkins instance has been repeatedly dying from (I think) running out of memory. I'm seeing loads of Accumuo and ZooKeeper processes hanging around. I'm guessing that perhaps the shutdown logic on the ES isn't right.
          Josh Elser made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Josh Elser added a comment -

          Ensure that the ES is shutdown before stop() returns from MiniAccumuloCluster. Unit test ensures that the ES is in fact shutdown. If there is somehow pending actions (which should be impossible, afaict), they get logged at WARN.

          Show
          Josh Elser added a comment - Ensure that the ES is shutdown before stop() returns from MiniAccumuloCluster. Unit test ensures that the ES is in fact shutdown. If there is somehow pending actions (which should be impossible, afaict), they get logged at WARN.
          ASF subversion and git services made changes -
          Time Spent 20m [ 1200 ] 0.5h [ 1800 ]
          Worklog Id 16647 [ 16647 ]
          ASF subversion and git services logged work - 11/Jul/14 18:31
          • Time Spent:
            10m
             
            Commit 609c7267c33368ff9cfbb459153546d22bc5d2ce in accumulo's branch refs/heads/master from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=609c726 ]

            ACCUMULO-2985 Make sure the ES used to stop processes with a timeout is stopped itself.

            To use a Callable to apply a timeout when trying to stop the accumulo processes
            in MiniAccumuloCluster, an ExecutorService is required. A single-thread ES was added
            as a part of ACCUMULO-2764, however, it was neglected to be stopped itself
            which added a non-daemon'ized thread to the JVM. This non-daemonized thread
            prevents the JVM from exiting cleanly.

            After using the ES to stop the Accumulo processes with a timeout, we
            need to be sure to stop the ES as well. Test with a Mocked ES ensures that
            the ES is stopped before MAC.stop() returns.
          ASF subversion and git services made changes -
          Time Spent 10m [ 600 ] 20m [ 1200 ]
          Worklog Id 16645 [ 16645 ]
          ASF subversion and git services logged work - 11/Jul/14 18:31
          • Time Spent:
            10m
             
            Commit 609c7267c33368ff9cfbb459153546d22bc5d2ce in accumulo's branch refs/heads/1.6.1-SNAPSHOT from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=609c726 ]

            ACCUMULO-2985 Make sure the ES used to stop processes with a timeout is stopped itself.

            To use a Callable to apply a timeout when trying to stop the accumulo processes
            in MiniAccumuloCluster, an ExecutorService is required. A single-thread ES was added
            as a part of ACCUMULO-2764, however, it was neglected to be stopped itself
            which added a non-daemon'ized thread to the JVM. This non-daemonized thread
            prevents the JVM from exiting cleanly.

            After using the ES to stop the Accumulo processes with a timeout, we
            need to be sure to stop the ES as well. Test with a Mocked ES ensures that
            the ES is stopped before MAC.stop() returns.
          ASF subversion and git services made changes -
          Remaining Estimate 0h [ 0 ]
          Time Spent 10m [ 600 ]
          Worklog Id 16643 [ 16643 ]
          ASF subversion and git services logged work - 11/Jul/14 18:31
          • Time Spent:
            10m
             
            Commit 609c7267c33368ff9cfbb459153546d22bc5d2ce in accumulo's branch refs/heads/1.5.2-SNAPSHOT from [~elserj]
            [ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=609c726 ]

            ACCUMULO-2985 Make sure the ES used to stop processes with a timeout is stopped itself.

            To use a Callable to apply a timeout when trying to stop the accumulo processes
            in MiniAccumuloCluster, an ExecutorService is required. A single-thread ES was added
            as a part of ACCUMULO-2764, however, it was neglected to be stopped itself
            which added a non-daemon'ized thread to the JVM. This non-daemonized thread
            prevents the JVM from exiting cleanly.

            After using the ES to stop the Accumulo processes with a timeout, we
            need to be sure to stop the ES as well. Test with a Mocked ES ensures that
            the ES is stopped before MAC.stop() returns.
          Josh Elser made changes -
          Fix Version/s 1.5.2 [ 12326272 ]
          Hide
          Josh Elser added a comment -

          There's a single non-daemon thread. To implement ACCUMULO-2764, I wrapped the methods which stop the MAC sub processes in Callable's so we can get the timeout semantics. Sadly, this requires an Executor to get those timeout semantics. That Executor wasn't being stopped which introduced the bug that the above program outlines.

          I'm guessing that because Maven ultimately just ends the forked process, we never noticed that the surefire runner wasn't cleanly exiting on its own.

          Show
          Josh Elser added a comment - There's a single non-daemon thread. To implement ACCUMULO-2764 , I wrapped the methods which stop the MAC sub processes in Callable 's so we can get the timeout semantics. Sadly, this requires an Executor to get those timeout semantics. That Executor wasn't being stopped which introduced the bug that the above program outlines. I'm guessing that because Maven ultimately just ends the forked process, we never noticed that the surefire runner wasn't cleanly exiting on its own.
          Josh Elser made changes -
          Link This issue is broken by ACCUMULO-2764 [ ACCUMULO-2764 ]
          Josh Elser made changes -
          Link This issue is related to ACCUMULO-2764 [ ACCUMULO-2764 ]
          Josh Elser made changes -
          Assignee Josh Elser [ elserj ]
          Christopher Tubbs made changes -
          Fix Version/s 1.7.0 [ 12324607 ]
          John Vines made changes -
          Remote Link This issue links to "review board link (Web Link)" [ 15831 ]
          John Vines made changes -
          Remote Link This issue links to "review board link (Web Link)" [ 15831 ]
          John Vines made changes -
          Comment [ It looks like it may be connecting to ZK and then it either goes down or is shut down

          {code}2014-07-09 16:12:43,863 INFO [main-SendThread(localhost:6349)] zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(947)) - Socket connection established to localhost/127.0.0.1:6349, initiating session
          2014-07-09 16:12:43,875 INFO [main-SendThread(localhost:6349)] zookeeper.ClientCnxn (ClientCnxn.java:readConnectResult(736)) - Session establishment complete on server localhost/127.0.0.1:6349, sessionid = 0x1471cc1a7290004, negotiated timeout = 30000
          2014-07-09 16:12:47,774 INFO [main-SendThread(localhost:6349)] zookeeper.ClientCnxn (ClientCnxn.java:run(1183)) - Unable to read additional data from server sessionid 0x1471cc1a7290004, likely server has closed socket, closing socket connection and attempting reconnect
          2014-07-09 16:12:49,183 INFO [main-SendThread(localhost:6349)] zookeeper.ClientCnxn (ClientCnxn.java:startConnect(1058)) - Opening socket connection to server localhost/127.0.0.1:6349
          2014-07-09 16:12:49,185 WARN [main-SendThread(localhost:6349)] zookeeper.ClientCnxn (ClientCnxn.java:run(1185)) - Session 0x1471cc1a7290004 for server null, unexpected error, closing socket connection and attempting reconnect
          java.net.ConnectException: Connection refused
          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
          at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143){code} ]
          John Vines made changes -
          Link This issue breaks ACCUMULO-2984 [ ACCUMULO-2984 ]
          Josh Elser made changes -
          Link This issue is related to ACCUMULO-2764 [ ACCUMULO-2764 ]
          John Vines made changes -
          Description Using the following code to do some work-
          {code}public class TestMACWithRealInstance {
            public static void main(String args[]) throws IOException, AccumuloException, AccumuloSecurityException, TableExistsException, InterruptedException {
              MiniAccumuloConfig macConfig = new MiniAccumuloConfig(new File("/tmp/mac"), "secret");
              macConfig.setNumTservers(2);
              MiniAccumuloCluster mac = new MiniAccumuloCluster(macConfig);
              mac.start();
              mac.getConnector("root", "secret").tableOperations().create("macCreated");
              mac.stop();
            }
          }
          {code}

          It works fine against 1.6.0, but it seems broken against 01da4f4a8b14a125d3a2e29ef98dd044ab9ec75f with it waiting for ZK to start in perpetuity.
          Using the following code to do some work-
          {code}public class TestMACWithRealInstance {
            public static void main(String args[]) throws IOException, AccumuloException, AccumuloSecurityException, TableExistsException, InterruptedException {
              MiniAccumuloConfig macConfig = new MiniAccumuloConfig(new File("/tmp/mac"), "secret");
              macConfig.setNumTservers(2);
              MiniAccumuloCluster mac = new MiniAccumuloCluster(macConfig);
              mac.start();
              mac.getConnector("root", "secret").tableOperations().create("macCreated");
              mac.stop();
            }
          }
          {code}

          It works fine against 1.6.0, but it seems broken against 01da4f4a8b14a125d3a2e29ef98dd044ab9ec75f after calling stop() it just sits in the terminal spewing messages about unable to connect to zookeeper
          John Vines made changes -
          Field Original Value New Value
          Summary MAC Can't start in 1.6.1-SNAPSHOT MAC doesn't stop cleanly in 1.6.1-SNAPSHOT
          Hide
          John Vines added a comment -

          Broken in 652253e01119e5c0445359c649316aa2d21dd718, which is the first commit in minicluster's code since 1.6.0

          Show
          John Vines added a comment - Broken in 652253e01119e5c0445359c649316aa2d21dd718, which is the first commit in minicluster's code since 1.6.0
          John Vines created issue -

            People

            • Assignee:
              Josh Elser
              Reporter:
              John Vines
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                  Development